[jira] [Created] (NUTCH-2308) Implement SSL Connection Test at TestNutchAPI

2016-08-23 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2308:


 Summary: Implement SSL Connection Test at TestNutchAPI
 Key: NUTCH-2308
 URL: https://issues.apache.org/jira/browse/NUTCH-2308
 Project: Nutch
  Issue Type: Improvement
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


Currently, testing of SSL is ignored at TestNutchAPI. We should complete the 
implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-1756) Security layer for NutchServer

2016-08-23 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-1756.
-
Resolution: Fixed

This completes the code implementation for yuor GSoC [~kamaci] congratulations.
Please put together a wiki page under 
http://wiki.apache.org/nutch/#Nutch_2.X_tutorial.28s.29
Thanks and again... congratulations. 

> Security layer for NutchServer
> --
>
> Key: NUTCH-1756
> URL: https://issues.apache.org/jira/browse/NUTCH-1756
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
>Priority: Critical
>  Labels: gsoc2016
> Fix For: 2.4
>
>
> It will be beneficial to have a security layer for NutchServer once we make 
> improvements upon it. I hope that GSoC goes ahead this year so we can tackle 
> such issues.
> This issue should implement a standard security layer for REST API calls. It 
> should also add/expose this functionality through the WebApp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-1756) Security layer for NutchServer

2016-08-23 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1756:

Fix Version/s: (was: 2.5)
   2.4

> Security layer for NutchServer
> --
>
> Key: NUTCH-1756
> URL: https://issues.apache.org/jira/browse/NUTCH-1756
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
>Priority: Critical
>  Labels: gsoc2016
> Fix For: 2.4
>
>
> It will be beneficial to have a security layer for NutchServer once we make 
> improvements upon it. I hope that GSoC goes ahead this year so we can tackle 
> such issues.
> This issue should implement a standard security layer for REST API calls. It 
> should also add/expose this functionality through the WebApp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2307) Implement Missing NutchServer REST API Tests

2016-08-23 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2307:


 Summary: Implement Missing NutchServer REST API Tests
 Key: NUTCH-2307
 URL: https://issues.apache.org/jira/browse/NUTCH-2307
 Project: Nutch
  Issue Type: Improvement
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


TestAPI.java was all commented. Reason was indicated as:

{quote}
CURRENTLY DISABLED. TESTS ARE FLAPPING FOR NO APPARENT REASON.
SHALL BE FIXED OR REPLACES BY NEW API IMPLEMENTATION
{quote}

So, we should implement that missing tests based on new 
AbstractNutchAPITestBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2301) Create Tests for Security Layer of NutchServer

2016-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433679#comment-15433679
 ] 

Hudson commented on NUTCH-2301:
---

FAILURE: Integrated in Jenkins build Nutch-nutchgora #1571 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1571/])
NUTCH-2301 Tests for Security Layer of NutchServer Are Created (furkankamaci: 
rev 3bc3d81e964aac59f61951740e848bd429a15b3c)
* (add) src/test/org/apache/nutch/api/TestNutchAPI.java
* (edit) src/test/nutch-site.xml
* (add) src/test/nutch-ssl.keystore.jks
* (add) src/test/org/apache/nutch/api/AbstractNutchAPITestBase.java
* (delete) src/test/org/apache/nutch/api/TestAPI.java


> Create Tests for Security Layer of NutchServer
> --
>
> Key: NUTCH-2301
> URL: https://issues.apache.org/jira/browse/NUTCH-2301
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> Create tests for security layer of NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Nutch-nutchgora #1571

2016-08-23 Thread Apache Jenkins Server
See 

Changes:

[furkankamaci] NUTCH-2301 Tests for Security Layer of NutchServer Are Created

--
[...truncated 2382 lines...]
copy-generated-lib:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: protocol-file

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: parse-metatags
[junit] Running org.apache.nutch.parse.metatags.TestMetaTagsParser
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.361 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: index-anchor

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: index-anchor
[junit] Running org.apache.nutch.indexer.anchor.TestAnchorIndexingFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.608 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: index-basic

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: index-basic
[junit] Running org.apache.nutch.indexer.basic.TestBasicIndexingFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.682 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: index-more

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: index-more
[junit] Running org.apache.nutch.indexer.more.TestMoreIndexingFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.21 sec

init:

init-plugin:
 [echo] Copying language profiles
 [echo] Copying test files

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: language-identifier

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: language-identifier
[junit] Running org.apache.nutch.analysis.lang.TestHTMLLanguageParser
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.013 sec

init:

init-plugin:

deps-jar:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: lib-http

jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: protocol-httpclient

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: protocol-httpclient
[junit] Running org.apache.nutch.protocol.httpclient.TestProtocolHttpClient
[junit] Tests run: 7, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 
6.187 sec
[junit] Test org.apache.nutch.protocol.httpclient.TestProtocolHttpClient 
FAILED

BUILD FAILED
:481: The following 
error occurred while executing this line:
:89: The 
following error occurred while executing this 

[jira] [Resolved] (NUTCH-2301) Create Tests for Security Layer of NutchServer

2016-08-23 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI resolved NUTCH-2301.
--
Resolution: Fixed

> Create Tests for Security Layer of NutchServer
> --
>
> Key: NUTCH-2301
> URL: https://issues.apache.org/jira/browse/NUTCH-2301
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> Create tests for security layer of NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2301) Create Tests for Security Layer of NutchServer

2016-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433646#comment-15433646
 ] 

ASF GitHub Bot commented on NUTCH-2301:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/146


> Create Tests for Security Layer of NutchServer
> --
>
> Key: NUTCH-2301
> URL: https://issues.apache.org/jira/browse/NUTCH-2301
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> Create tests for security layer of NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #146: NUTCH-2301 Tests for Security Layer of NutchServer

2016-08-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/146


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2301) Create Tests for Security Layer of NutchServer

2016-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433560#comment-15433560
 ] 

ASF GitHub Bot commented on NUTCH-2301:
---

GitHub user kamaci opened a pull request:

https://github.com/apache/nutch/pull/146

NUTCH-2301 Tests for Security Layer of NutchServer



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kamaci/nutch NUTCH-2301

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #146


commit 3bc3d81e964aac59f61951740e848bd429a15b3c
Author: Furkan KAMACI 
Date:   2016-08-23T20:41:43Z

NUTCH-2301 Tests for Security Layer of NutchServer Are Created




> Create Tests for Security Layer of NutchServer
> --
>
> Key: NUTCH-2301
> URL: https://issues.apache.org/jira/browse/NUTCH-2301
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> Create tests for security layer of NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #146: NUTCH-2301 Tests for Security Layer of NutchServer

2016-08-23 Thread kamaci
GitHub user kamaci opened a pull request:

https://github.com/apache/nutch/pull/146

NUTCH-2301 Tests for Security Layer of NutchServer



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kamaci/nutch NUTCH-2301

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #146


commit 3bc3d81e964aac59f61951740e848bd429a15b3c
Author: Furkan KAMACI 
Date:   2016-08-23T20:41:43Z

NUTCH-2301 Tests for Security Layer of NutchServer Are Created




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2306) Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API

2016-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433178#comment-15433178
 ] 

Hudson commented on NUTCH-2306:
---

SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1570 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1570/])
NUTCH-2306 Id of Active Configuration Could Be Stored at NutchStatus and 
(furkankamaci: rev ed96b104ddf82bcb20557a29b251c3fd73eb146a)
* (edit) src/java/org/apache/nutch/api/model/response/NutchStatus.java
* (edit) src/java/org/apache/nutch/api/resources/AdminResource.java
* (edit) src/java/org/apache/nutch/api/resources/AbstractResource.java


> Id of Active Configuration Could Be Stored at NutchStatus and Exposed via 
> REST API
> --
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also store the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are 
> merged, we will be able to store acitive configuration id and expose this 
> information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use

2016-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433176#comment-15433176
 ] 

Hudson commented on NUTCH-2303:
---

SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1570 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1570/])
NUTCH-2303 NutchServer Could Be Able To Select a Configuration to Use 
(furkankamaci: rev 6227f3b171b67e790a089d6fee4d3c65de0e0ee1)
* (edit) src/java/org/apache/nutch/api/NutchServer.java
* (edit) src/java/org/apache/nutch/api/security/SecurityUtil.java


> NutchServer Could Be Able To Select a Configuration to Use
> --
>
> Key: NUTCH-2303
> URL: https://issues.apache.org/jira/browse/NUTCH-2303
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations. However, 
> currently NutchServer uses default config and it could be let to set an 
> active configuration id when startup a NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2306) Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API

2016-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433074#comment-15433074
 ] 

ASF GitHub Bot commented on NUTCH-2306:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/145


> Id of Active Configuration Could Be Stored at NutchStatus and Exposed via 
> REST API
> --
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also store the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are 
> merged, we will be able to store acitive configuration id and expose this 
> information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2306) Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API

2016-08-23 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-2306.
-
Resolution: Fixed

Merged [~kamaci] thanks

> Id of Active Configuration Could Be Stored at NutchStatus and Exposed via 
> REST API
> --
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also store the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are 
> merged, we will be able to store acitive configuration id and expose this 
> information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #145: NUTCH-2306 Id of Active Configuration Could Be Stor...

2016-08-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/145


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use

2016-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433072#comment-15433072
 ] 

ASF GitHub Bot commented on NUTCH-2303:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/144


> NutchServer Could Be Able To Select a Configuration to Use
> --
>
> Key: NUTCH-2303
> URL: https://issues.apache.org/jira/browse/NUTCH-2303
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations. However, 
> currently NutchServer uses default config and it could be let to set an 
> active configuration id when startup a NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #144: NUTCH-2303 NutchServer Could Be Able To Select a Co...

2016-08-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/144


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use

2016-08-23 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-2303.
-
Resolution: Fixed

Thanks [~kamaci].

> NutchServer Could Be Able To Select a Configuration to Use
> --
>
> Key: NUTCH-2303
> URL: https://issues.apache.org/jira/browse/NUTCH-2303
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations. However, 
> currently NutchServer uses default config and it could be let to set an 
> active configuration id when startup a NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Nutch-trunk #3394

2016-08-23 Thread Apache Jenkins Server
See 

--
[...truncated 87 lines...]
[ivy:resolve]   module not found: org.apache.hadoop#hadoop-common;2.7.2
[ivy:resolve]    local: tried
[ivy:resolve] 
/home/jenkins/.ivy2/local/org.apache.hadoop/hadoop-common/2.7.2/ivys/ivy.xml
[ivy:resolve] -- artifact 
org.apache.hadoop#hadoop-common;2.7.2!hadoop-common.jar:
[ivy:resolve] 
/home/jenkins/.ivy2/local/org.apache.hadoop/hadoop-common/2.7.2/jars/hadoop-common.jar
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.7.2/hadoop-common-2.7.2.pom
[ivy:resolve]    apache-snapshot: tried
[ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/2.7.2/hadoop-common-2.7.2.pom
[ivy:resolve] -- artifact 
org.apache.hadoop#hadoop-common;2.7.2!hadoop-common.jar:
[ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common/2.7.2/hadoop-common-2.7.2.jar
[ivy:resolve]    sonatype: tried
[ivy:resolve] 
http://oss.sonatype.org/content/repositories/releases/org/apache/hadoop/hadoop-common/2.7.2/hadoop-common-2.7.2.pom
[ivy:resolve] -- artifact 
org.apache.hadoop#hadoop-common;2.7.2!hadoop-common.jar:
[ivy:resolve] 
http://oss.sonatype.org/content/repositories/releases/org/apache/hadoop/hadoop-common/2.7.2/hadoop-common-2.7.2.jar
[ivy:resolve]   problem while downloading module descriptor: 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/2.7.2/hadoop-hdfs-2.7.2.pom:
 No space left on device (20ms)
[ivy:resolve]   module not found: org.apache.hadoop#hadoop-hdfs;2.7.2
[ivy:resolve]    local: tried
[ivy:resolve] 
/home/jenkins/.ivy2/local/org.apache.hadoop/hadoop-hdfs/2.7.2/ivys/ivy.xml
[ivy:resolve] -- artifact 
org.apache.hadoop#hadoop-hdfs;2.7.2!hadoop-hdfs.jar:
[ivy:resolve] 
/home/jenkins/.ivy2/local/org.apache.hadoop/hadoop-hdfs/2.7.2/jars/hadoop-hdfs.jar
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/2.7.2/hadoop-hdfs-2.7.2.pom
[ivy:resolve]    apache-snapshot: tried
[ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-hdfs/2.7.2/hadoop-hdfs-2.7.2.pom
[ivy:resolve] -- artifact 
org.apache.hadoop#hadoop-hdfs;2.7.2!hadoop-hdfs.jar:
[ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-hdfs/2.7.2/hadoop-hdfs-2.7.2.jar
[ivy:resolve]    sonatype: tried
[ivy:resolve] 
http://oss.sonatype.org/content/repositories/releases/org/apache/hadoop/hadoop-hdfs/2.7.2/hadoop-hdfs-2.7.2.pom
[ivy:resolve] -- artifact 
org.apache.hadoop#hadoop-hdfs;2.7.2!hadoop-hdfs.jar:
[ivy:resolve] 
http://oss.sonatype.org/content/repositories/releases/org/apache/hadoop/hadoop-hdfs/2.7.2/hadoop-hdfs-2.7.2.jar
[ivy:resolve]   problem while downloading module descriptor: 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.2/hadoop-mapreduce-client-core-2.7.2.pom:
 No space left on device (16ms)
[ivy:resolve]   module not found: 
org.apache.hadoop#hadoop-mapreduce-client-core;2.7.2
[ivy:resolve]    local: tried
[ivy:resolve] 
/home/jenkins/.ivy2/local/org.apache.hadoop/hadoop-mapreduce-client-core/2.7.2/ivys/ivy.xml
[ivy:resolve] -- artifact 
org.apache.hadoop#hadoop-mapreduce-client-core;2.7.2!hadoop-mapreduce-client-core.jar:
[ivy:resolve] 
/home/jenkins/.ivy2/local/org.apache.hadoop/hadoop-mapreduce-client-core/2.7.2/jars/hadoop-mapreduce-client-core.jar
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.2/hadoop-mapreduce-client-core-2.7.2.pom
[ivy:resolve]    apache-snapshot: tried
[ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.2/hadoop-mapreduce-client-core-2.7.2.pom
[ivy:resolve] -- artifact 
org.apache.hadoop#hadoop-mapreduce-client-core;2.7.2!hadoop-mapreduce-client-core.jar:
[ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.2/hadoop-mapreduce-client-core-2.7.2.jar
[ivy:resolve]    sonatype: tried
[ivy:resolve] 
http://oss.sonatype.org/content/repositories/releases/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.2/hadoop-mapreduce-client-core-2.7.2.pom
[ivy:resolve] -- artifact 
org.apache.hadoop#hadoop-mapreduce-client-core;2.7.2!hadoop-mapreduce-client-core.jar:
[ivy:resolve] 
http://oss.sonatype.org/content/repositories/releases/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.2/hadoop-mapreduce-client-core-2.7.2.jar
[ivy:resolve]   problem while downloading module descriptor: 

[jira] [Commented] (NUTCH-2306) Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API

2016-08-23 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432543#comment-15432543
 ] 

Furkan KAMACI commented on NUTCH-2306:
--

[~lewismc] I've created the PR. Could you apply this after 
https://github.com/apache/nutch/pull/144

> Id of Active Configuration Could Be Stored at NutchStatus and Exposed via 
> REST API
> --
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also store the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are 
> merged, we will be able to store acitive configuration id and expose this 
> information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use

2016-08-23 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432541#comment-15432541
 ] 

Furkan KAMACI commented on NUTCH-2303:
--

[~lewismc] I've created the PR.

> NutchServer Could Be Able To Select a Configuration to Use
> --
>
> Key: NUTCH-2303
> URL: https://issues.apache.org/jira/browse/NUTCH-2303
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations. However, 
> currently NutchServer uses default config and it could be let to set an 
> active configuration id when startup a NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2306) Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API

2016-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432538#comment-15432538
 ] 

ASF GitHub Bot commented on NUTCH-2306:
---

GitHub user kamaci opened a pull request:

https://github.com/apache/nutch/pull/145

NUTCH-2306 Id of Active Configuration Could Be Stored at NutchStatus and 
Exposed via REST API



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kamaci/nutch NUTCH-2306

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/145.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #145






> Id of Active Configuration Could Be Stored at NutchStatus and Exposed via 
> REST API
> --
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also store the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are 
> merged, we will be able to store acitive configuration id and expose this 
> information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #145: NUTCH-2306 Id of Active Configuration Could Be Stor...

2016-08-23 Thread kamaci
GitHub user kamaci opened a pull request:

https://github.com/apache/nutch/pull/145

NUTCH-2306 Id of Active Configuration Could Be Stored at NutchStatus and 
Exposed via REST API



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kamaci/nutch NUTCH-2306

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/145.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #145






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2242) lastModified not always set

2016-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432469#comment-15432469
 ] 

Hudson commented on NUTCH-2242:
---

FAILURE: Integrated in Jenkins build Nutch-trunk #3393 (See 
[https://builds.apache.org/job/Nutch-trunk/3393/])
NUTCH-2164 NUTCH-2242 Inconsistent 'Modified Time' in crawl db / (snagel: rev 
70622c3e18cee879f5a38d895f68dd0be69461e1)
* (edit) src/java/org/apache/nutch/crawl/DefaultFetchSchedule.java
* (edit) src/java/org/apache/nutch/protocol/ProtocolOutput.java
* (edit) src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java
* (edit) src/test/org/apache/nutch/crawl/TestCrawlDbStates.java


> lastModified not always set
> ---
>
> Key: NUTCH-2242
> URL: https://issues.apache.org/jira/browse/NUTCH-2242
> Project: Nutch
>  Issue Type: Bug
>  Components: crawldb
>Affects Versions: 1.11
>Reporter: Jurian Broertjes
>Priority: Minor
> Fix For: 1.13
>
> Attachments: NUTCH-2242.patch
>
>
> I observed two issues:
> - When using the DefaultFetchSchedule, CrawlDatum's modifiedTime field is not 
> updated on the first successful fetch. 
> - When a document modification is detected (protocol- or signature-wise), the 
> modifiedTime isn't updated
> I can provide a patch later today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2164) Inconsistent 'Modified Time' in crawl db

2016-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432468#comment-15432468
 ] 

Hudson commented on NUTCH-2164:
---

FAILURE: Integrated in Jenkins build Nutch-trunk #3393 (See 
[https://builds.apache.org/job/Nutch-trunk/3393/])
NUTCH-2164 NUTCH-2242 Inconsistent 'Modified Time' in crawl db / (snagel: rev 
70622c3e18cee879f5a38d895f68dd0be69461e1)
* (edit) src/java/org/apache/nutch/crawl/DefaultFetchSchedule.java
* (edit) src/java/org/apache/nutch/protocol/ProtocolOutput.java
* (edit) src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java
* (edit) src/test/org/apache/nutch/crawl/TestCrawlDbStates.java


> Inconsistent 'Modified Time' in crawl db
> 
>
> Key: NUTCH-2164
> URL: https://issues.apache.org/jira/browse/NUTCH-2164
> Project: Nutch
>  Issue Type: Improvement
>  Components: crawldb, fetcher
>Affects Versions: 1.11
>Reporter: Thamme Gowda
>Priority: Minor
> Fix For: 1.13
>
>
> The 'Modified time' in crawldb is invalid. It is set to (0-Timezone 
> Difference)
> *How to verify/reproduce:*
>   Run 'nutch readdb /path/to/crawldb -dump yy' and then inspect content of 
> 'yy'
> The following improvements can be done:
> 1. Set modified time by DefaultFetchSchedule
> 2. Set ProtocolStatus.lastModified if modified time is available in protocol 
> response headers
> This issue is also discussed in dev mailing lists: 
> http://www.mail-archive.com/dev@nutch.apache.org/msg19803.html#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Nutch-trunk #3393

2016-08-23 Thread Apache Jenkins Server
See 

Changes:

[snagel] NUTCH-2164 NUTCH-2242 Inconsistent 'Modified Time' in crawl db /

--
[...truncated 5342 lines...]

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-basic
[junit] Running 
org.apache.nutch.net.urlnormalizer.ajax.TestAjaxURLNormalizer
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.078 sec
[junit] Running 
org.apache.nutch.net.urlnormalizer.basic.TestBasicURLNormalizer

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-host

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-host
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.493 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-pass

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 

[junit] Running 
org.apache.nutch.net.urlnormalizer.host.TestHostURLNormalizer

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-pass
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.67 sec
[junit] Running 
org.apache.nutch.net.urlnormalizer.pass.TestPassURLNormalizer

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-protocol

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 

[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.309 sec

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-protocol

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-querystring

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-querystring
[junit] Running 
org.apache.nutch.net.urlnormalizer.protocol.TestProtocolURLNormalizer
[junit] Running 
org.apache.nutch.net.urlnormalizer.querystring.TestQuerystringURLNormalizer
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.73 sec
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.355 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-regex

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 


init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-slash

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 


jar:

deps-test:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

jar:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-regex

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-slash
[junit] Running 
org.apache.nutch.net.urlnormalizer.slash.TestSlashURLNormalizer
[junit] Running 
org.apache.nutch.net.urlnormalizer.regex.TestRegexURLNormalizer
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.513 sec
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, 

[jira] [Commented] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use

2016-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432466#comment-15432466
 ] 

ASF GitHub Bot commented on NUTCH-2303:
---

GitHub user kamaci opened a pull request:

https://github.com/apache/nutch/pull/144

NUTCH-2303 NutchServer Could Be Able To Select a Configuration to Use



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kamaci/nutch NUTCH-2303_2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/144.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #144


commit 6227f3b171b67e790a089d6fee4d3c65de0e0ee1
Author: Furkan KAMACI 
Date:   2016-08-23T09:16:00Z

NUTCH-2303 NutchServer Could Be Able To Select a Configuration to Use




> NutchServer Could Be Able To Select a Configuration to Use
> --
>
> Key: NUTCH-2303
> URL: https://issues.apache.org/jira/browse/NUTCH-2303
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations. However, 
> currently NutchServer uses default config and it could be let to set an 
> active configuration id when startup a NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #144: NUTCH-2303 NutchServer Could Be Able To Select a Co...

2016-08-23 Thread kamaci
GitHub user kamaci opened a pull request:

https://github.com/apache/nutch/pull/144

NUTCH-2303 NutchServer Could Be Able To Select a Configuration to Use



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kamaci/nutch NUTCH-2303_2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/144.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #144


commit 6227f3b171b67e790a089d6fee4d3c65de0e0ee1
Author: Furkan KAMACI 
Date:   2016-08-23T09:16:00Z

NUTCH-2303 NutchServer Could Be Able To Select a Configuration to Use




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (NUTCH-2242) lastModified not always set

2016-08-23 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-2242:
---
Assignee: (was: Sebastian Nagel)

> lastModified not always set
> ---
>
> Key: NUTCH-2242
> URL: https://issues.apache.org/jira/browse/NUTCH-2242
> Project: Nutch
>  Issue Type: Bug
>  Components: crawldb
>Affects Versions: 1.11
>Reporter: Jurian Broertjes
>Priority: Minor
> Fix For: 1.13
>
> Attachments: NUTCH-2242.patch
>
>
> I observed two issues:
> - When using the DefaultFetchSchedule, CrawlDatum's modifiedTime field is not 
> updated on the first successful fetch. 
> - When a document modification is detected (protocol- or signature-wise), the 
> modifiedTime isn't updated
> I can provide a patch later today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2242) lastModified not always set

2016-08-23 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2242.

Resolution: Fixed

Committed (70622c3) to 1.x including NUTCH-2164. Thanks, [~jurian]! Thanks, 
[~markus.jel...@openindex.io]!

> lastModified not always set
> ---
>
> Key: NUTCH-2242
> URL: https://issues.apache.org/jira/browse/NUTCH-2242
> Project: Nutch
>  Issue Type: Bug
>  Components: crawldb
>Affects Versions: 1.11
>Reporter: Jurian Broertjes
>Priority: Minor
> Fix For: 1.13
>
> Attachments: NUTCH-2242.patch
>
>
> I observed two issues:
> - When using the DefaultFetchSchedule, CrawlDatum's modifiedTime field is not 
> updated on the first successful fetch. 
> - When a document modification is detected (protocol- or signature-wise), the 
> modifiedTime isn't updated
> I can provide a patch later today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2246) Refactor /seed endpoint for backward compatibility

2016-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432400#comment-15432400
 ] 

Hudson commented on NUTCH-2246:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3392 (See 
[https://builds.apache.org/job/Nutch-trunk/3392/])
Remove NUTCH-2246 from the 1.12 section of CHANGES.txt (fixed in 1.13) (snagel: 
rev 78e99092c6d1308e054f9a20e50b7a6eb6206784)
* (edit) CHANGES.txt


> Refactor /seed endpoint for backward compatibility
> --
>
> Key: NUTCH-2246
> URL: https://issues.apache.org/jira/browse/NUTCH-2246
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api
>Affects Versions: 1.12
>Reporter: Sujen Shah
>Assignee: Sujen Shah
>Priority: Minor
>  Labels: memex
> Fix For: 1.13
>
>
> Currently the seed endpoint allows you to create a seed list by providing a 
> list of urls passed as an argument. 
> After the first refactor here - 
> https://issues.apache.org/jira/browse/NUTCH-2090. User could no longer 
> provide a physical path to the seedlist. 
> Nutch should give both options to the user.
> Additionally, once a seedlist is created by providing a list of urls (not a 
> physical file), Nutch should store it like it does for the configurations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2242) lastModified not always set

2016-08-23 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel reassigned NUTCH-2242:
--

Assignee: Sebastian Nagel

> lastModified not always set
> ---
>
> Key: NUTCH-2242
> URL: https://issues.apache.org/jira/browse/NUTCH-2242
> Project: Nutch
>  Issue Type: Bug
>  Components: crawldb
>Affects Versions: 1.11
>Reporter: Jurian Broertjes
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.13
>
> Attachments: NUTCH-2242.patch
>
>
> I observed two issues:
> - When using the DefaultFetchSchedule, CrawlDatum's modifiedTime field is not 
> updated on the first successful fetch. 
> - When a document modification is detected (protocol- or signature-wise), the 
> modifiedTime isn't updated
> I can provide a patch later today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2164) Inconsistent 'Modified Time' in crawl db

2016-08-23 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2164.

Resolution: Fixed

Committed (70622c3) to 1.x including NUTCH-2242. Thanks, [~tgow...@gmail.com]!

> Inconsistent 'Modified Time' in crawl db
> 
>
> Key: NUTCH-2164
> URL: https://issues.apache.org/jira/browse/NUTCH-2164
> Project: Nutch
>  Issue Type: Improvement
>  Components: crawldb, fetcher
>Affects Versions: 1.11
>Reporter: Thamme Gowda
>Priority: Minor
> Fix For: 1.13
>
>
> The 'Modified time' in crawldb is invalid. It is set to (0-Timezone 
> Difference)
> *How to verify/reproduce:*
>   Run 'nutch readdb /path/to/crawldb -dump yy' and then inspect content of 
> 'yy'
> The following improvements can be done:
> 1. Set modified time by DefaultFetchSchedule
> 2. Set ProtocolStatus.lastModified if modified time is available in protocol 
> response headers
> This issue is also discussed in dev mailing lists: 
> http://www.mail-archive.com/dev@nutch.apache.org/msg19803.html#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #108: NUTCH-2164 NUTCH-2242 Inconsistent 'Modified Time' ...

2016-08-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/108


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2164) Inconsistent 'Modified Time' in crawl db

2016-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432388#comment-15432388
 ] 

ASF GitHub Bot commented on NUTCH-2164:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/108


> Inconsistent 'Modified Time' in crawl db
> 
>
> Key: NUTCH-2164
> URL: https://issues.apache.org/jira/browse/NUTCH-2164
> Project: Nutch
>  Issue Type: Improvement
>  Components: crawldb, fetcher
>Affects Versions: 1.11
>Reporter: Thamme Gowda
>Priority: Minor
> Fix For: 1.13
>
>
> The 'Modified time' in crawldb is invalid. It is set to (0-Timezone 
> Difference)
> *How to verify/reproduce:*
>   Run 'nutch readdb /path/to/crawldb -dump yy' and then inspect content of 
> 'yy'
> The following improvements can be done:
> 1. Set modified time by DefaultFetchSchedule
> 2. Set ProtocolStatus.lastModified if modified time is available in protocol 
> response headers
> This issue is also discussed in dev mailing lists: 
> http://www.mail-archive.com/dev@nutch.apache.org/msg19803.html#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2246) Refactor /seed endpoint for backward compatibility

2016-08-23 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432283#comment-15432283
 ] 

Sebastian Nagel commented on NUTCH-2246:


Thanks! I've removed it from the 1.12 section of CHANGES.txt, so that it will 
not appear twice.

> Refactor /seed endpoint for backward compatibility
> --
>
> Key: NUTCH-2246
> URL: https://issues.apache.org/jira/browse/NUTCH-2246
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api
>Affects Versions: 1.12
>Reporter: Sujen Shah
>Assignee: Sujen Shah
>Priority: Minor
>  Labels: memex
> Fix For: 1.13
>
>
> Currently the seed endpoint allows you to create a seed list by providing a 
> list of urls passed as an argument. 
> After the first refactor here - 
> https://issues.apache.org/jira/browse/NUTCH-2090. User could no longer 
> provide a physical path to the seedlist. 
> Nutch should give both options to the user.
> Additionally, once a seedlist is created by providing a list of urls (not a 
> physical file), Nutch should store it like it does for the configurations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)