[jira] [Commented] (NUTCH-2848) Consider use of StringUtil#isEmpty

2021-02-07 Thread Furkan Kamaci (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280412#comment-17280412
 ] 

Furkan Kamaci commented on NUTCH-2848:
--

You are right!

However, second method can be error-prone in case of the given string is null?
{code:java}
public static boolean isEmpty(String str) { 
return str.length == 0; 
} 
{code}
 On the other hand, we may need to check either a given String has length or 
null via util class as follows:
{code:java}
public static boolean hasLength(String str) {
 return (str != null && str.length() > 0);
}
{code}
We may need to check these files for it:
{noformat}
grep -lr ".length() > 0" .
./src/test/org/apache/nutch/util/TestSuffixStringMatcher.java
./src/test/org/apache/nutch/util/TestPrefixStringMatcher.java
./src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java
./src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java
./src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestMSWordParser.java
./src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestOOParser.java
./src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMBuilder.java
./src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMContentUtils.java
./src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/HttpResponse.java
./src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java
./src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java
./src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
./src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf/SWFParser.java
./src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMBuilder.java
./src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java
./src/plugin/protocol-htmlunit/src/java/org/apache/nutch/protocol/htmlunit/HttpResponse.java
./src/plugin/index-replace/src/java/org/apache/nutch/indexer/replace/FieldReplacer.java
./src/plugin/index-replace/src/java/org/apache/nutch/indexer/replace/ReplaceIndexer.java
./src/plugin/urlnormalizer-ajax/src/java/org/apache/nutch/net/urlnormalizer/ajax/AjaxURLNormalizer.java
./src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
./src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java
./src/java/org/apache/nutch/tools/DmozParser.java
./src/java/org/apache/nutch/util/TrieStringMatcher.java
./src/java/org/apache/nutch/util/TableUtil.java
./src/java/org/apache/nutch/plugin/PluginManifestParser.java
./src/java/org/apache/nutch/crawl/TextProfileSignature.java
./src/java/org/apache/nutch/crawl/Injector.java
./src/java/org/apache/nutch/hostdb/HostDatum.java
./src/java/org/apache/nutch/metadata/Metadata.java{noformat}
due to there may be different forms which aligns with hasLength() method as 
like:
{code:java}
if ((null != data) && (data.trim().length() > 0)) {  
throw new org.xml.sax.SAXException("Warning: can't output text before 
document element!  Ignoring...");
}
{code}
[https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMBuilder.java#L158]

> Consider use of StringUtil#isEmpty
> --
>
> Key: NUTCH-2848
> URL: https://issues.apache.org/jira/browse/NUTCH-2848
> Project: Nutch
>  Issue Type: Improvement
>  Components: util
>Reporter: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.19
>
>
> We should consider 'standardizing' the use of 
> [StringUtil#isEmpty()|https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/util/StringUtil.java#L133-L138]
>  across the codebase.
> {code:java}
>   /**
>* Checks if a string is empty (ie is null or empty).
>*/
>   public static boolean isEmpty(String str) {
> return (str == null) || (str.equals(""));
>   }
> {code}
> So far the impact is as follows
> {code:bash}
> grep -lr ".equals(\"\")" .
> ./plugin/urlnormalizer-protocol/src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java
> ./plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
> ./plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java
> ./plugin/parsefilter-regex/src/java/org/apache/nutch/parsefilter/regex/RegexParseFilter.java
> ./plugin/feed/src/java/org/apache/nutch/parse/feed/FeedParser.java
> ./plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/Train.java
> ./plugin/language-identifier/src/test/org/apache/nutch/analysis/lang/TestHTMLLanguageParser.java
> ./plugin/urlnormalizer-slash/src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java
> 

[jira] [Commented] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8

2017-01-20 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832026#comment-15832026
 ] 

Furkan KAMACI commented on NUTCH-2171:
--

[~lewismc] I can fix javadocs to as like NUTCH-2089 We can add lambdas except 
for anonymous classes throughout the next releases after with this improvement.

> Upgrade Nutch Trunk to Java 1.8
> ---
>
> Key: NUTCH-2171
> URL: https://issues.apache.org/jira/browse/NUTCH-2171
> Project: Nutch
>  Issue Type: Task
>Reporter: Lewis John McGibbney
>
> Lambda expressions are fantastic. I tried to undertake a small exercise which 
> would indicate how many we could implement however this was a fruitless 
> effort. A patch is going to be a better approach. This task involves 
> upgrading various properties in default.properties as well as a systemic 
> source code analysis with the aim of implementing Java 8 goodies throughout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (NUTCH-2348) Close GZIPInputStream

2017-01-20 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI closed NUTCH-2348.

Resolution: Won't Fix

> Close GZIPInputStream
> -
>
> Key: NUTCH-2348
> URL: https://issues.apache.org/jira/browse/NUTCH-2348
> Project: Nutch
>  Issue Type: Bug
>  Components: tool
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> GZIPInputStream is not closed and it should be finally closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8

2017-01-20 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831815#comment-15831815
 ] 

Furkan KAMACI commented on NUTCH-2171:
--

[~lewismc] I've analysed the code and there are 9 anonymous classes. These 
seems to be not important and won't add any gain but I can send a PR. 

I've also analysed the code for replacing explicit type operators with diamond 
(Java 7 feature) and there are 193 of them and also there are 9 usage which is 
not using try with resources feature of Java 7.

I can send a such PR and we can change the default.properties and let 
developers to use Java 8 features for upcoming contributions to Nutch source 
code?

> Upgrade Nutch Trunk to Java 1.8
> ---
>
> Key: NUTCH-2171
> URL: https://issues.apache.org/jira/browse/NUTCH-2171
> Project: Nutch
>  Issue Type: Task
>Reporter: Lewis John McGibbney
>
> Lambda expressions are fantastic. I tried to undertake a small exercise which 
> would indicate how many we could implement however this was a fruitless 
> effort. A patch is going to be a better approach. This task involves 
> upgrading various properties in default.properties as well as a systemic 
> source code analysis with the aim of implementing Java 8 goodies throughout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2346) Check Types at Object Equality

2017-01-20 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831546#comment-15831546
 ] 

Furkan KAMACI commented on NUTCH-2346:
--

[~lewismc] We should check both either object is null or not and belongs to 
same class when checking equality of objects. GeneratorJob.java does not 
consider null case and class. I've added it. 

On the other hand, Metadata.java checks for null case but tries to check class 
equality with exception handling, which is not the proper way.

> Check Types at Object Equality
> --
>
> Key: NUTCH-2346
> URL: https://issues.apache.org/jira/browse/NUTCH-2346
> Project: Nutch
>  Issue Type: Bug
>  Components: generator, metadata
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2352) Log with Generic Class Name at Nutch 1.x

2017-01-18 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2352:


 Summary: Log with Generic Class Name at Nutch 1.x
 Key: NUTCH-2352
 URL: https://issues.apache.org/jira/browse/NUTCH-2352
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.12
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
Priority: Minor
 Fix For: 1.13


There are many mistakes when some reference code is copied and created a new 
class and a logger is used. We can log with a generic class name to avoid it as 
like:

{code:java}
private static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
{code}

(cf. SOLR-8324)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2351) Log with Generic Class Name at Nutch 2.x

2017-01-17 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826734#comment-15826734
 ] 

Furkan KAMACI commented on NUTCH-2351:
--

[~wastl-nagel] if this is OK, I can send the PR for Nutch 1.x too.

> Log with Generic Class Name at Nutch 2.x
> 
>
> Key: NUTCH-2351
> URL: https://issues.apache.org/jira/browse/NUTCH-2351
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.4
>
>
> There are many mistakes when some reference code is copied and created a new 
> class and a logger is used. We can log with a generic class name to avoid it 
> as like:
> {code:java}
> private static final Logger LOG = 
> LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2351) Log with Generic Class Name

2017-01-17 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2351:
-
Description: 
There are many mistakes when some reference code is copied and created a new 
class and a logger is used. We can log with a generic class name to avoid it as 
like:

{code:java}
private static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
{code}



> Log with Generic Class Name
> ---
>
> Key: NUTCH-2351
> URL: https://issues.apache.org/jira/browse/NUTCH-2351
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.4
>
>
> There are many mistakes when some reference code is copied and created a new 
> class and a logger is used. We can log with a generic class name to avoid it 
> as like:
> {code:java}
> private static final Logger LOG = 
> LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2351) Log with Generic Class Name

2017-01-17 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2351:


 Summary: Log with Generic Class Name
 Key: NUTCH-2351
 URL: https://issues.apache.org/jira/browse/NUTCH-2351
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 2.3.1
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
Priority: Minor
 Fix For: 2.4






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name

2017-01-17 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826467#comment-15826467
 ] 

Furkan KAMACI commented on NUTCH-2345:
--

[~wastl-nagel] I'll provide the patch as soon as possible.

> FetchItemQueue logs are logged with wrong class name
> 
>
> Key: NUTCH-2345
> URL: https://issues.apache.org/jira/browse/NUTCH-2345
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.11, 1.12
> Environment: Any
>Reporter: Monika Gupta
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 1.13
>
>
> I ran bin/nutch fetch and notice that the log statements of class 
> FetchItemQueue.java are logged in logs/hadoop.log with wrong file name as 
> FetchItemQueues.java
> Refer the execution log:
> 2017-01-06 15:31:25,562 INFO  fetcher.FetchItemQueues -   maxThreads= 1
> 2017-01-06 15:31:28,565 INFO  fetcher.FetchItemQueues -   inProgress= 0
> Issue is in the logger for class FetchItemQueue.java. 
> Currently it is-
> private static final Logger LOG = 
> LoggerFactory.getLogger(FetchItemQueues.class);
> Correction: It should be-
> private static final Logger LOG = 
> LoggerFactory.getLogger(FetchItemQueue.class);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2350) Add Missing activeConfId Field to NutchStatus Object

2017-01-17 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826452#comment-15826452
 ] 

Furkan KAMACI commented on NUTCH-2350:
--

This is unrelated to NUTCH-2344. Nutch Web GUI tries to convert NutchStatus 
object at api package to its NutchStatus object (which is at Web GUI package). 
This is the related code from NutchClientImpl:

{code:java}
  @Override
  public NutchStatus getNutchStatus() {
return nutchResource.path("/admin").type(APPLICATION_JSON)
.get(NutchStatus.class);
  }
{code}

Return class from Nutch REST API and expected classes have same name but 
different. I've added necessary field to web gun package object.

> Add Missing activeConfId Field to NutchStatus Object
> 
>
> Key: NUTCH-2350
> URL: https://issues.apache.org/jira/browse/NUTCH-2350
> Project: Nutch
>  Issue Type: Bug
>  Components: web gui
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name

2017-01-16 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824389#comment-15824389
 ] 

Furkan KAMACI commented on NUTCH-2345:
--

[~lewismc] Such mistakes are usual when some reference code is copied and 
created a new class. This is a generic code to get class name and which is used 
at Solr now:

{code:java}
private static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
{code}

I can switch all the loggers to that convention?

> FetchItemQueue logs are logged with wrong class name
> 
>
> Key: NUTCH-2345
> URL: https://issues.apache.org/jira/browse/NUTCH-2345
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.11, 1.12
> Environment: Any
>Reporter: Monika Gupta
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 1.13
>
>
> I ran bin/nutch fetch and notice that the log statements of class 
> FetchItemQueue.java are logged in logs/hadoop.log with wrong file name as 
> FetchItemQueues.java
> Refer the execution log:
> 2017-01-06 15:31:25,562 INFO  fetcher.FetchItemQueues -   maxThreads= 1
> 2017-01-06 15:31:28,565 INFO  fetcher.FetchItemQueues -   inProgress= 0
> Issue is in the logger for class FetchItemQueue.java. 
> Currently it is-
> private static final Logger LOG = 
> LoggerFactory.getLogger(FetchItemQueues.class);
> Correction: It should be-
> private static final Logger LOG = 
> LoggerFactory.getLogger(FetchItemQueue.class);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2350) Add Missing activeConfId Field to NutchStatus Object

2017-01-15 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2350:
-
Summary: Add Missing activeConfId Field to NutchStatus Object  (was: Add 
Missing Fields to NutchStatus Object)

> Add Missing activeConfId Field to NutchStatus Object
> 
>
> Key: NUTCH-2350
> URL: https://issues.apache.org/jira/browse/NUTCH-2350
> Project: Nutch
>  Issue Type: Bug
>  Components: web gui
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2350) Add Missing Fields to NutchStatus Object

2017-01-15 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2350:
-
Fix Version/s: (was: 1.13)
   2.4

> Add Missing Fields to NutchStatus Object
> 
>
> Key: NUTCH-2350
> URL: https://issues.apache.org/jira/browse/NUTCH-2350
> Project: Nutch
>  Issue Type: Bug
>  Components: web gui
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2350) Add Missing Fields to NutchStatus Object

2017-01-15 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2350:


 Summary: Add Missing Fields to NutchStatus Object
 Key: NUTCH-2350
 URL: https://issues.apache.org/jira/browse/NUTCH-2350
 Project: Nutch
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.12
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 1.13






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2344) Authentication Support for Web GUI

2017-01-14 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822860#comment-15822860
 ] 

Furkan KAMACI commented on NUTCH-2344:
--

[~lewismc] this seems to be related a mismatch between activeConfId at 
NutchStatus of REST API. I think that you could get that error without applying 
the patch too. Could check without applying to patch to understand whether you 
still get that error or not? I'll provide a fix for it.

> Authentication Support for Web GUI
> --
>
> Key: NUTCH-2344
> URL: https://issues.apache.org/jira/browse/NUTCH-2344
> Project: Nutch
>  Issue Type: New Feature
>  Components: web gui
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
> Attachments: Firefox_Screenshot_2017-01-13T19-10-49.499Z.png
>
>
> We should implement an authentication support for Web GUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2199) Documentation for Nutch 2.X REST API

2017-01-09 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812766#comment-15812766
 ] 

Furkan KAMACI commented on NUTCH-2199:
--

[~lewismc] Can we close this issue due to it is a duplicate of NUTCH-2243

> Documentation for Nutch 2.X REST API
> 
>
> Key: NUTCH-2199
> URL: https://issues.apache.org/jira/browse/NUTCH-2199
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Affects Versions: 2.3.1
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.5
>
>
> The work done on NUTCH-1800 needs to be ported to 2.X branch. This is 
> trivial, I thought I had already done it but obviously not. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2348) Close GZIPInputStream

2017-01-09 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2348:


 Summary: Close GZIPInputStream
 Key: NUTCH-2348
 URL: https://issues.apache.org/jira/browse/NUTCH-2348
 Project: Nutch
  Issue Type: Bug
  Components: tool
Affects Versions: 2.3.1
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


GZIPInputStream is not closed and it should be finally closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2347) Use Logger Instead of Printing Throwable

2017-01-09 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2347:
-
Priority: Minor  (was: Major)

> Use Logger Instead of Printing Throwable
> 
>
> Key: NUTCH-2347
> URL: https://issues.apache.org/jira/browse/NUTCH-2347
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.4
>
>
> Loggers should be used instead of printing Throwable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2347) Use Logger Instead of Printing Throwable

2017-01-09 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2347:


 Summary: Use Logger Instead of Printing Throwable
 Key: NUTCH-2347
 URL: https://issues.apache.org/jira/browse/NUTCH-2347
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 2.3.1
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


Loggers should be used instead of printing Throwable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1915) Error in Nutch 2.X WebApp stalls progress bar

2017-01-09 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812283#comment-15812283
 ] 

Furkan KAMACI commented on NUTCH-1915:
--

[~lewismc] Could you increase *DEFAULT_TIMEOUT_SEC* which has default value of 
*60* at _RemoteCommandExecutor.java_ and try it again?

> Error in Nutch 2.X WebApp stalls progress bar
> -
>
> Key: NUTCH-1915
> URL: https://issues.apache.org/jira/browse/NUTCH-1915
> Project: Nutch
>  Issue Type: Bug
>  Components: web gui
>Affects Versions: 2.3
> Environment: Nutch 2.3-SNAPSHOT HEAD
> HBase 0.94.14
> Gora 0.5
>Reporter: Lewis John McGibbney
> Fix For: 2.5
>
>
> When I define a crawl within the Nutch 2.X webapp on the above stack I 
> sometimes get the following stack trace
> {code}
> 2015-01-12 14:48:25,943 INFO  fetcher.FetcherJob - fetching 
> http://www.darpa.mil/Our_Work/I2O/Personnel/Mr__Steve_Jameson.aspx (queue 
> crawl delay=5000ms)
> 2015-01-12 14:48:26,563 ERROR impl.RemoteCommandExecutor - Remote command 
> failed
> java.util.concurrent.TimeoutException
>   at java.util.concurrent.FutureTask.get(FutureTask.java:201)
>   at 
> org.apache.nutch.webui.client.impl.RemoteCommandExecutor.executeRemoteJob(RemoteCommandExecutor.java:61)
>   at 
> org.apache.nutch.webui.client.impl.CrawlingCycle.executeCrawlCycle(CrawlingCycle.java:58)
>   at 
> org.apache.nutch.webui.service.impl.CrawlServiceImpl.startCrawl(CrawlServiceImpl.java:69)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
>   at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
>   at 
> org.springframework.aop.interceptor.AsyncExecutionInterceptor$1.call(AsyncExecutionInterceptor.java:97)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-01-12 14:48:26,563 INFO  impl.CrawlingCycle - Executed remote command 
> data: FETCH status: FAILED
> 2015-01-12 14:48:27,088 INFO  fetcher.FetcherJob - 10/10 spinwaiting/active, 
> 71 pages, 1 errors, 1.4 1 pages/s, 275 146 kb/s, 193 URLs in 4 queues
> {code} 
> Right now I don't know what this relates to but I know that it stalls the 
> task execution progress bar within the 
> [CrawlsPage|https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/webui/pages/crawls/CrawlsPage.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2205) Nutch solrdedup error in solrcloud for larger docs

2017-01-09 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812265#comment-15812265
 ] 

Furkan KAMACI commented on NUTCH-2205:
--

[~VictorHu] Do you still get that error? Because logs says:

bq.No live SolrServers available
 
and it seems that your cluster was down as [~markus17] pointed.

> Nutch solrdedup error in solrcloud for larger docs 
> ---
>
> Key: NUTCH-2205
> URL: https://issues.apache.org/jira/browse/NUTCH-2205
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 2.3
> Environment: CentOS 6.5,Jdk 1.7.0_75,omcat 8.0.9 ,Hadoop 
> 2.5.2,Zookeeper 3.4.6 ,Hbase 0.98.8 ,Solr 4.8.1 ,Nutch 2.3.1
>Reporter: VictorHu
> Fix For: 2.5
>
>
> When the number of solr docs larger than 9000,the solrdedup of the nutch is 
> broken.This is log: 
> http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2
> 16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: 
> starting...
> 16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: Solr 
> url: http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2
> 16/01/25 17:02:39 INFO client.RMProxy: Connecting to ResourceManager at 
> master.Itble/10.192.1.100:8032
> 16/01/25 17:02:43 INFO mapreduce.JobSubmitter: number of splits:1
> 16/01/25 17:02:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_1453104806095_0162
> 16/01/25 17:02:44 INFO impl.YarnClientImpl: Submitted application 
> application_1453104806095_0162
> 16/01/25 17:02:44 INFO mapreduce.Job: The url to track the job: 
> http://master.Itble:8088/proxy/application_1453104806095_0162/
> 16/01/25 17:02:44 INFO mapreduce.Job: Running job: job_1453104806095_0162
> 16/01/25 17:02:54 INFO mapreduce.Job: Job job_1453104806095_0162 running in 
> uber mode : false
> 16/01/25 17:02:54 INFO mapreduce.Job:  map 0% reduce 0%
> 16/01/25 17:03:02 INFO mapreduce.Job: Task Id : 
> attempt_1453104806095_0162_m_00_0, Status : FAILED
> Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers 
> available to handle this 
> request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2,
>  http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, 
> http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
> at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
> at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
> at 
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.(MapTask.java:492)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> 16/01/25 17:03:12 INFO mapreduce.Job: Task Id : 
> attempt_1453104806095_0162_m_00_1, Status : FAILED
> Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers 
> available to handle this 
> request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2,
>  http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, 
> http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1, 
> http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1]
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
> at 
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
> at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
> at 
> 

[jira] [Commented] (NUTCH-2257) apache-nutch-2.3.1-src.tar.gz can not be built

2017-01-09 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812256#comment-15812256
 ] 

Furkan KAMACI commented on NUTCH-2257:
--

[~kamir1604] I don't get such an error when I build 2.3.1 tag. Does that 
problem still exist?

> apache-nutch-2.3.1-src.tar.gz can not be built
> --
>
> Key: NUTCH-2257
> URL: https://issues.apache.org/jira/browse/NUTCH-2257
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.3.1
> Environment: jdk1.7.0_67
> nutch 2.3.1
> Hadoop 2.6.0
> HBase 1.0.0
>Reporter: Mirko Kaempf
>
> The build fails for:
>apache-nutch-2.3.1-src.tar.gz 
> but after replacing src folder from 
>   apache-nutch-2.3.1-src.zip 
> the build works fine.
> Error messages:
> compile:
>  [echo] Compiling plugin: indexer-solr
> [javac] Compiling 1 source file to 
> /opt/examples/apache-nutch-2.3.1/build/indexer-solr/classes
> [javac] 
> /opt/examples/apache-nutch-2.3.1/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrUtils.java:24:
>  error: cannot find symbol
> [javac] if (job.getBoolean(SolrConstants.USE_AUTH, false)) {
> [javac]^
> [javac]   symbol:   variable SolrConstants
> [javac]   location: class SolrUtils
> [javac] 
> /opt/examples/apache-nutch-2.3.1/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrUtils.java:25:
>  error: cannot find symbol
> [javac]   String username = job.get(SolrConstants.USERNAME);
> [javac] ^
> [javac]   symbol:   variable SolrConstants
> [javac]   location: class SolrUtils
> [javac] 
> /opt/examples/apache-nutch-2.3.1/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrUtils.java:35:
>  error: cannot find symbol
> [javac]   .get(SolrConstants.PASSWORD)));
> [javac]^
> [javac]   symbol:   variable SolrConstants
> [javac]   location: class SolrUtils
> [javac] 
> /opt/examples/apache-nutch-2.3.1/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrUtils.java:43:
>  error: cannot find symbol
> [javac] return new HttpSolrServer(job.get(SolrConstants.SERVER_URL), 
> client);
> [javac]   ^
> [javac]   symbol:   variable SolrConstants
> [javac]   location: class SolrUtils
> [javac] 4 errors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2275) MD5Signature by default doesn't take in account parse

2017-01-09 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812209#comment-15812209
 ] 

Furkan KAMACI commented on NUTCH-2275:
--

[~fre93] does that problem still exist?

> MD5Signature by default doesn't take in account parse
> -
>
> Key: NUTCH-2275
> URL: https://issues.apache.org/jira/browse/NUTCH-2275
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.11
>Reporter: Francesco Capponi
>
> I'm testing Apache Nutch with the feed's plugin. I've noticed that for each 
> page it generates the same digest/signature, therefore the dedup cleans 
> everything up from the database.
> I'm wondering why the class MD5Signature is the default one instead of 
> TextMD5Signature.
> Anyhow now I've modified a little bit the MD5Signature to let it work with 
> the feed plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2313) Error in Nutch 2.X WebApp Inject

2017-01-09 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812081#comment-15812081
 ] 

Furkan KAMACI commented on NUTCH-2313:
--

[~kherox] Could you increase *DEFAULT_TIMEOUT_SEC* which has default value of 
*60* at _RemoteCommandExecutor.java_ and try it again?

> Error in Nutch 2.X WebApp Inject 
> -
>
> Key: NUTCH-2313
> URL: https://issues.apache.org/jira/browse/NUTCH-2313
> Project: Nutch
>  Issue Type: Bug
>  Components: web gui
>Affects Versions: 2.3
> Environment: Nutch 2.3.1
> Hbase 2
> Hadoop 2.4
>Reporter: kakou Denis
> Fix For: 2.5
>
>
> when i define a crawl within the web, I have this ouput.
> 16/09/01 20:58:51 INFO resource.PropertiesFactory: Loading properties files 
> from 
> jar:file:/tmp/hadoop-unjar2700707934366020491/lib/wicket-extensions-6.13.0.jar!/org/apache/wicket/extensions/Initializer.properties
>  with loader 
> org.apache.wicket.resource.IsoPropertiesFilePropertiesLoader@37dc175b
> 16/09/01 20:59:10 WARN RequestCycleExtra: 
> 16/09/01 20:59:10 WARN RequestCycleExtra: Handling the following exception
> org.apache.wicket.core.request.mapper.StalePageException
> 16/09/01 20:59:10 WARN RequestCycleExtra: 
> 16/09/01 20:59:10 WARN render.WebPageRenderer: The Buffered response should 
> be handled by BufferedResponseRequestHandler
> 16/09/01 20:59:22 ERROR impl.RemoteCommandExecutor: Remote command failed
> java.util.concurrent.TimeoutException
> at java.util.concurrent.FutureTask.get(FutureTask.java:205)
> at 
> org.apache.nutch.webui.client.impl.RemoteCommandExecutor.executeRemoteJob(RemoteCommandExecutor.java:61)
> at 
> org.apache.nutch.webui.client.impl.CrawlingCycle.executeCrawlCycle(CrawlingCycle.java:58)
> at 
> org.apache.nutch.webui.service.impl.CrawlServiceImpl.startCrawl(CrawlServiceImpl.java:69)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
> at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190)
> at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
> at 
> org.springframework.aop.interceptor.AsyncExecutionInterceptor$1.call(AsyncExecutionInterceptor.java:97)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 16/09/01 20:59:22 INFO impl.CrawlingCycle: Executed remote command data: 
> INJECT status: FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2342) Inlinks are not being indexed as part of index-links plugin

2017-01-09 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812058#comment-15812058
 ] 

Furkan KAMACI commented on NUTCH-2342:
--

[~19manish90] Do you have logs for your problem?

> Inlinks are not being indexed as part of index-links plugin
> ---
>
> Key: NUTCH-2342
> URL: https://issues.apache.org/jira/browse/NUTCH-2342
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer, linkdb
>Affects Versions: 1.12
> Environment: We are using linux machines for DEV and UAT.
>Reporter: Manish Bassi
>
> I have used index-links plugin along with other plugins to index both the 
> inlinks and outlinks for a given page. But only the outlinks are getting 
> indexed and not the inlinks.
> Due to this issue, even the anchor plugin is not working as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name

2017-01-09 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811983#comment-15811983
 ] 

Furkan KAMACI commented on NUTCH-2345:
--

Thanks for reporting it [~Mgupta]! I've created the PR.

> FetchItemQueue logs are logged with wrong class name
> 
>
> Key: NUTCH-2345
> URL: https://issues.apache.org/jira/browse/NUTCH-2345
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.11, 1.12
> Environment: Any
>Reporter: Monika Gupta
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 1.13
>
>
> I ran bin/nutch fetch and notice that the log statements of class 
> FetchItemQueue.java are logged in logs/hadoop.log with wrong file name as 
> FetchItemQueues.java
> Refer the execution log:
> 2017-01-06 15:31:25,562 INFO  fetcher.FetchItemQueues -   maxThreads= 1
> 2017-01-06 15:31:28,565 INFO  fetcher.FetchItemQueues -   inProgress= 0
> Issue is in the logger for class FetchItemQueue.java. 
> Currently it is-
> private static final Logger LOG = 
> LoggerFactory.getLogger(FetchItemQueues.class);
> Correction: It should be-
> private static final Logger LOG = 
> LoggerFactory.getLogger(FetchItemQueue.class);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name

2017-01-09 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI reassigned NUTCH-2345:


Assignee: Furkan KAMACI

> FetchItemQueue logs are logged with wrong class name
> 
>
> Key: NUTCH-2345
> URL: https://issues.apache.org/jira/browse/NUTCH-2345
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.11, 1.12
> Environment: Any
>Reporter: Monika Gupta
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 1.13
>
>
> I ran bin/nutch fetch and notice that the log statements of class 
> FetchItemQueue.java are logged in logs/hadoop.log with wrong file name as 
> FetchItemQueues.java
> Refer the execution log:
> 2017-01-06 15:31:25,562 INFO  fetcher.FetchItemQueues -   maxThreads= 1
> 2017-01-06 15:31:28,565 INFO  fetcher.FetchItemQueues -   inProgress= 0
> Issue is in the logger for class FetchItemQueue.java. 
> Currently it is-
> private static final Logger LOG = 
> LoggerFactory.getLogger(FetchItemQueues.class);
> Correction: It should be-
> private static final Logger LOG = 
> LoggerFactory.getLogger(FetchItemQueue.class);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2346) Check Types at Object Equality

2017-01-09 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2346:


 Summary: Check Types at Object Equality
 Key: NUTCH-2346
 URL: https://issues.apache.org/jira/browse/NUTCH-2346
 Project: Nutch
  Issue Type: Bug
  Components: generator, metadata
Affects Versions: 2.3.1
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
Priority: Minor
 Fix For: 2.4






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2344) Authentication Support for Web GUI

2017-01-08 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2344:


 Summary: Authentication Support for Web GUI
 Key: NUTCH-2344
 URL: https://issues.apache.org/jira/browse/NUTCH-2344
 Project: Nutch
  Issue Type: New Feature
  Components: web gui
Affects Versions: 2.3.1
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


We should implement an authentication support for Web GUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2268) SolrIndexerJob: java.lang.RuntimeException

2017-01-08 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15809455#comment-15809455
 ] 

Furkan KAMACI commented on NUTCH-2268:
--

Which error do you get?

> SolrIndexerJob: java.lang.RuntimeException
> --
>
> Key: NUTCH-2268
> URL: https://issues.apache.org/jira/browse/NUTCH-2268
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 2.3.1
> Environment: iam using 
> Hbase V:hbase-0.98.19-hadoop2
> Solr V : 6.0.0
> Nutch : 2.3.1
> java : 8
>Reporter: narendra
>  Labels: indexing
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Could you please help out of this error 
> SolrIndexerJob: java.lang.RuntimeException: job 
> failed:name=apache-nutch-2.3.1.jar   
> when i run this commend 
> local/bin/nutch solrindex http://localhost:8983/solr/ -all
> Tried with Solr 4.10.3 but same error iam getting 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2226) SOLR mismatch in deploy mode

2016-12-23 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773457#comment-15773457
 ] 

Furkan KAMACI commented on NUTCH-2226:
--

[~markus17] This issue will be fixed by NUTCH-2267 and can be closed because of 
being duplicate. You can check this conversation: 
https://mail-archives.apache.org/mod_mbox/nutch-user/201612.mbox/%3c738483596.8830478.1482188268...@mail.yahoo.com%3e

> SOLR mismatch in deploy mode
> 
>
> Key: NUTCH-2226
> URL: https://issues.apache.org/jira/browse/NUTCH-2226
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Reporter: Steven W
>  Labels: solr
>
> I receive this error when indexing to SolrCloud in deploy mode on Hadoop 
> 2.7.0:
> Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, 
> stack[0]) is not assignable to 
> 'org/apache/http/impl/client/CloseableHttpClient'
> I'm assuming there's a version mismatch somewhere in the deploy JAR, but I 
> don't know where to look. This is related to NUTCH-2197.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1220) Upgrade Solr deps

2016-12-23 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773453#comment-15773453
 ] 

Furkan KAMACI commented on NUTCH-1220:
--

[~lewismc] this issue seems to be old and can be closed.

> Upgrade Solr deps
> -
>
> Key: NUTCH-1220
> URL: https://issues.apache.org/jira/browse/NUTCH-1220
> Project: Nutch
>  Issue Type: Task
>  Components: build, indexer
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: NUTCH-1633-trunk.patch
>
>
> SlfJ4 needs to be part of upgrade to Solr 3.5 but that breaks something else. 
> Likely Hadoop has a different Slf4J version?
> {code}
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.slf4j.spi.LocationAwareLogger.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V
> at 
> org.apache.commons.logging.impl.SLF4JLocationAwareLog.debug(SLF4JLocationAwareLog.java:133)
> at 
> org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:136)
> at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:180)
> at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159)
> at 
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216)
> at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:409)
> at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:395)
> at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1418)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1319)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:109)
> at 
> org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:544)
> at 
> org.apache.hadoop.mapred.FileInputFormat.addInputPath(FileInputFormat.java:339)
> at 
> org.apache.nutch.util.domain.DomainStatistics.run(DomainStatistics.java:108)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at 
> org.apache.nutch.util.domain.DomainStatistics.main(DomainStatistics.java:215)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2268) SolrIndexerJob: java.lang.RuntimeException

2016-12-23 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773441#comment-15773441
 ] 

Furkan KAMACI commented on NUTCH-2268:
--

[~lewismc] we can close this issue.

> SolrIndexerJob: java.lang.RuntimeException
> --
>
> Key: NUTCH-2268
> URL: https://issues.apache.org/jira/browse/NUTCH-2268
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 2.3.1
> Environment: iam using 
> Hbase V:hbase-0.98.19-hadoop2
> Solr V : 6.0.0
> Nutch : 2.3.1
> java : 8
>Reporter: narendra
>  Labels: indexing
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Could you please help out of this error 
> SolrIndexerJob: java.lang.RuntimeException: job 
> failed:name=apache-nutch-2.3.1.jar   
> when i run this commend 
> local/bin/nutch solrindex http://localhost:8983/solr/ -all
> Tried with Solr 4.10.3 but same error iam getting 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2089) Move Nutch 2.x to compile on JDK 8

2016-09-03 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15461848#comment-15461848
 ] 

Furkan KAMACI commented on NUTCH-2089:
--

[~lewismc] I've created a PR which includes fixes of all errors, some warnings 
and javadoc improvements. Javadoc can be generated at Nutch 2.x without any 
errors anymore.

> Move Nutch 2.x to compile on JDK 8
> --
>
> Key: NUTCH-2089
> URL: https://issues.apache.org/jira/browse/NUTCH-2089
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
> Attachments: java8output.txt, java8output.txt
>
>
> Public support updates for JDK 1.7 stopped in April of this year.
> https://www.java.com/en/download/faq/java_7.xml
> In our next release we should shift support to JDK 1.8.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2314) Use indexer-elastic2 Plugin for javadoc and eclipse Targets

2016-09-02 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2314:


 Summary: Use indexer-elastic2 Plugin for javadoc and eclipse 
Targets
 Key: NUTCH-2314
 URL: https://issues.apache.org/jira/browse/NUTCH-2314
 Project: Nutch
  Issue Type: Bug
  Components: plugin
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


indexer-elastic2 plugin is used at deploy and clean tasks of plugin/build.xml 
However, indexer-elastic plugin is used instead of indexer-elastic2 for javadoc 
and eclipse tasks at build.xml and gives error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2308) Implement SSL Connection Test at TestNutchAPI

2016-08-30 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449410#comment-15449410
 ] 

Furkan KAMACI commented on NUTCH-2308:
--

[~lewismc] I've updated the PR.

> Implement SSL Connection Test at TestNutchAPI
> -
>
> Key: NUTCH-2308
> URL: https://issues.apache.org/jira/browse/NUTCH-2308
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> Currently, testing of SSL is ignored at TestNutchAPI. We should complete the 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2264) Check Forbidden APIs at Build

2016-08-29 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446237#comment-15446237
 ] 

Furkan KAMACI commented on NUTCH-2264:
--

[~lewismc] I've created a precommit task which depends on runtime and test 
tasks and responsible for checking forbidden apis. I can bind it to runtime or 
test (or both) if you want. I've also fixed all the errors reported by 
forbiddenapis.

> Check Forbidden APIs at Build
> -
>
> Key: NUTCH-2264
> URL: https://issues.apache.org/jira/browse/NUTCH-2264
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>
> We should avoid [forbidden 
> calls|https://github.com/policeman-tools/forbidden-apis/wiki]  and check in 
> the ant build for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2264) Check Forbidden APIs at Build

2016-08-29 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2264:
-
Priority: Major  (was: Minor)

> Check Forbidden APIs at Build
> -
>
> Key: NUTCH-2264
> URL: https://issues.apache.org/jira/browse/NUTCH-2264
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>
> We should avoid [forbidden 
> calls|https://github.com/policeman-tools/forbidden-apis/wiki]  and check in 
> the ant build for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2264) Check Forbidden APIs at Build

2016-08-29 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2264:
-
Summary: Check Forbidden APIs at Build  (was: Check Forbidden API's at 
Build)

> Check Forbidden APIs at Build
> -
>
> Key: NUTCH-2264
> URL: https://issues.apache.org/jira/browse/NUTCH-2264
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
>
> We should avoid [forbidden 
> calls|https://github.com/policeman-tools/forbidden-apis/wiki]  and check in 
> the ant build for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (NUTCH-2307) Implement Missing NutchServer REST API Tests

2016-08-27 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2307 started by Furkan KAMACI.

> Implement Missing NutchServer REST API Tests
> 
>
> Key: NUTCH-2307
> URL: https://issues.apache.org/jira/browse/NUTCH-2307
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> TestAPI.java was all commented. Reason was indicated as:
> {quote}
> CURRENTLY DISABLED. TESTS ARE FLAPPING FOR NO APPARENT REASON.
> SHALL BE FIXED OR REPLACES BY NEW API IMPLEMENTATION
> {quote}
> So, we should implement that missing tests based on new 
> AbstractNutchAPITestBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (NUTCH-2264) Check Forbidden API's at Build

2016-08-27 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2264 started by Furkan KAMACI.

> Check Forbidden API's at Build
> --
>
> Key: NUTCH-2264
> URL: https://issues.apache.org/jira/browse/NUTCH-2264
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
>
> We should avoid [forbidden 
> calls|https://github.com/policeman-tools/forbidden-apis/wiki]  and check in 
> the ant build for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2122) Implement Javadoc package-info.java for webui packages

2016-08-27 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI reassigned NUTCH-2122:


Assignee: Furkan KAMACI

> Implement Javadoc package-info.java for webui packages
> --
>
> Key: NUTCH-2122
> URL: https://issues.apache.org/jira/browse/NUTCH-2122
> Project: Nutch
>  Issue Type: Improvement
>  Components: nutch server
>Affects Versions: 1.10
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
>Priority: Trivial
> Fix For: 1.13
>
>
> [~sujenshah] I noticed that the Javadoc does not contain package.html 
> displaying package level introductory Javadoc as every other package does.
> http://nutch.apache.org/apidocs/apidocs-1.10/index.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2308) Implement SSL Connection Test at TestNutchAPI

2016-08-26 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440208#comment-15440208
 ] 

Furkan KAMACI commented on NUTCH-2308:
--

[~lewismc] Could you check it again?

> Implement SSL Connection Test at TestNutchAPI
> -
>
> Key: NUTCH-2308
> URL: https://issues.apache.org/jira/browse/NUTCH-2308
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> Currently, testing of SSL is ignored at TestNutchAPI. We should complete the 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2308) Implement SSL Connection Test at TestNutchAPI

2016-08-26 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439409#comment-15439409
 ] 

Furkan KAMACI commented on NUTCH-2308:
--

[~lewismc] could you check my PR?

> Implement SSL Connection Test at TestNutchAPI
> -
>
> Key: NUTCH-2308
> URL: https://issues.apache.org/jira/browse/NUTCH-2308
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> Currently, testing of SSL is ignored at TestNutchAPI. We should complete the 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2308) Implement SSL Connection Test at TestNutchAPI

2016-08-23 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2308:


 Summary: Implement SSL Connection Test at TestNutchAPI
 Key: NUTCH-2308
 URL: https://issues.apache.org/jira/browse/NUTCH-2308
 Project: Nutch
  Issue Type: Improvement
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


Currently, testing of SSL is ignored at TestNutchAPI. We should complete the 
implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2307) Implement Missing NutchServer REST API Tests

2016-08-23 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2307:


 Summary: Implement Missing NutchServer REST API Tests
 Key: NUTCH-2307
 URL: https://issues.apache.org/jira/browse/NUTCH-2307
 Project: Nutch
  Issue Type: Improvement
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


TestAPI.java was all commented. Reason was indicated as:

{quote}
CURRENTLY DISABLED. TESTS ARE FLAPPING FOR NO APPARENT REASON.
SHALL BE FIXED OR REPLACES BY NEW API IMPLEMENTATION
{quote}

So, we should implement that missing tests based on new 
AbstractNutchAPITestBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2301) Create Tests for Security Layer of NutchServer

2016-08-23 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI resolved NUTCH-2301.
--
Resolution: Fixed

> Create Tests for Security Layer of NutchServer
> --
>
> Key: NUTCH-2301
> URL: https://issues.apache.org/jira/browse/NUTCH-2301
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> Create tests for security layer of NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2306) Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API

2016-08-23 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432543#comment-15432543
 ] 

Furkan KAMACI commented on NUTCH-2306:
--

[~lewismc] I've created the PR. Could you apply this after 
https://github.com/apache/nutch/pull/144

> Id of Active Configuration Could Be Stored at NutchStatus and Exposed via 
> REST API
> --
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also store the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are 
> merged, we will be able to store acitive configuration id and expose this 
> information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use

2016-08-23 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432541#comment-15432541
 ] 

Furkan KAMACI commented on NUTCH-2303:
--

[~lewismc] I've created the PR.

> NutchServer Could Be Able To Select a Configuration to Use
> --
>
> Key: NUTCH-2303
> URL: https://issues.apache.org/jira/browse/NUTCH-2303
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations. However, 
> currently NutchServer uses default config and it could be let to set an 
> active configuration id when startup a NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (NUTCH-2301) Create Tests for Security Layer of NutchServer

2016-08-22 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2301 started by Furkan KAMACI.

> Create Tests for Security Layer of NutchServer
> --
>
> Key: NUTCH-2301
> URL: https://issues.apache.org/jira/browse/NUTCH-2301
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> Create tests for security layer of NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (NUTCH-2302) RAMConfManager Could Be Constructed With Custom Configuration

2016-08-22 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2302 started by Furkan KAMACI.

> RAMConfManager Could Be Constructed With Custom Configuration 
> --
>
> Key: NUTCH-2302
> URL: https://issues.apache.org/jira/browse/NUTCH-2302
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations which can be 
> accessible via a configuration id. However, it forces you to use a default 
> configuration with a default id when you construct it. When RAMConfManager is 
> used by any other classes they cannot set a custom configuration and it leads 
> problem. i.e. test resources cannot be used when you test NutchServer due to 
> it uses default configuration which is forced by RAMConfManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2306) Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API

2016-08-22 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2306:
-
Description: NutchStatus holds information about configuration it uses. 
However, it should also store the id of that configuration. Once NUTCH-2302 and 
NUTCH-2303 are merged, we will be able to store acitive configuration id and 
expose this information via REST API.  (was: NutchStatus holds information 
about configuration it uses. However, it should also expose the id of that 
configuration. Once NUTCH-2302 and NUTCH-2303 are merged, we will be able to 
store used configuration id and expose this information via REST API.)

> Id of Active Configuration Could Be Stored at NutchStatus and Exposed via 
> REST API
> --
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also store the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are 
> merged, we will be able to store acitive configuration id and expose this 
> information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2306) Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API

2016-08-22 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2306:
-
Summary: Id of Active Configuration Could Be Stored at NutchStatus and 
Exposed via REST API  (was: Id of Active Configuration at NutchStatus Could Be 
Stored and Exposed)

> Id of Active Configuration Could Be Stored at NutchStatus and Exposed via 
> REST API
> --
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also expose the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are 
> merged, we will be able to store used configuration id and expose this 
> information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2306) Id of Active Configuration at NutchStatus Could Be Stored and Exposed

2016-08-22 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2306:
-
Summary: Id of Active Configuration at NutchStatus Could Be Stored and 
Exposed  (was: Id of Active Configuration at NutcStatus Could Be Stored and 
Exposed)

> Id of Active Configuration at NutchStatus Could Be Stored and Exposed
> -
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also expose the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are 
> merged, we will be able to store used configuration id and expose this 
> information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2306) Id of Active Configuration at NutcStatus Could Be Stored and Exposed

2016-08-22 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430254#comment-15430254
 ] 

Furkan KAMACI commented on NUTCH-2306:
--

[~lewismc] I've finished the mentioned implementation. Once NUTCH-2302 and 
NUTCH-2303 are merged I can create the PR.

> Id of Active Configuration at NutcStatus Could Be Stored and Exposed
> 
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also expose the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are 
> merged, we will be able to store used configuration id and expose this 
> information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2306) Id of Active Configuration at NutcStatus Could Be Stored and Exposed

2016-08-22 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2306:
-
Description: NutchStatus holds information about configuration it uses. 
However, it should also expose the id of that configuration. Once NUTCH-2302 
and NUTCH-2303 are merged, we will be able to store used configuration id and 
expose this information via REST API.  (was: NutchStatus holds information 
about configuration it uses. However, it should also expose the id of that 
configuration. Once NUTCH-2302 is merged, we will be able to store used 
configuration id and expose this information via REST API.)

> Id of Active Configuration at NutcStatus Could Be Stored and Exposed
> 
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also expose the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are 
> merged, we will be able to store used configuration id and expose this 
> information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2306) Id of Active Configuration at NutcStatus Could Be Stored and Exposed

2016-08-22 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2306:
-
Summary: Id of Active Configuration at NutcStatus Could Be Stored and 
Exposed  (was: Id of Used Configuration at NutcStatus Could Be Stored and 
Exposed)

> Id of Active Configuration at NutcStatus Could Be Stored and Exposed
> 
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also expose the id of that configuration. Once NUTCH-2302 is merged, we will 
> be able to store used configuration id and expose this information via REST 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2306) Id of Used Configuration at NutcStatus Could Be Stored and Exposed

2016-08-22 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2306:


 Summary: Id of Used Configuration at NutcStatus Could Be Stored 
and Exposed
 Key: NUTCH-2306
 URL: https://issues.apache.org/jira/browse/NUTCH-2306
 Project: Nutch
  Issue Type: Improvement
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


NutchStatus holds information about configuration it uses. However, it should 
also expose the id of that configuration. Once NUTCH-2302 is merged, we will be 
able to store used configuration id and expose this information via REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (NUTCH-2306) Id of Used Configuration at NutcStatus Could Be Stored and Exposed

2016-08-22 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2306 started by Furkan KAMACI.

> Id of Used Configuration at NutcStatus Could Be Stored and Exposed
> --
>
> Key: NUTCH-2306
> URL: https://issues.apache.org/jira/browse/NUTCH-2306
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> NutchStatus holds information about configuration it uses. However, it should 
> also expose the id of that configuration. Once NUTCH-2302 is merged, we will 
> be able to store used configuration id and expose this information via REST 
> API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use

2016-08-20 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2303:
-
Comment: was deleted

(was: [~lewismc] I improved Javadoc for all methods of committed class and 
combined that commits into one. Could you check the PR again?)

> NutchServer Could Be Able To Select a Configuration to Use
> --
>
> Key: NUTCH-2303
> URL: https://issues.apache.org/jira/browse/NUTCH-2303
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations. However, 
> currently NutchServer uses default config and it could be let to set an 
> active configuration id when startup a NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use

2016-08-20 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429522#comment-15429522
 ] 

Furkan KAMACI commented on NUTCH-2303:
--

[~lewismc] I improved Javadoc for all methods of committed class and combined 
that commits into one. Could you check the PR again?

> NutchServer Could Be Able To Select a Configuration to Use
> --
>
> Key: NUTCH-2303
> URL: https://issues.apache.org/jira/browse/NUTCH-2303
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations. However, 
> currently NutchServer uses default config and it could be let to set an 
> active configuration id when startup a NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use

2016-08-20 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429402#comment-15429402
 ] 

Furkan KAMACI commented on NUTCH-2303:
--

[~lewismc] I've finished the implementation but it requires NUTCH-2302 to be 
applied.

> NutchServer Could Be Able To Select a Configuration to Use
> --
>
> Key: NUTCH-2303
> URL: https://issues.apache.org/jira/browse/NUTCH-2303
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations. However, 
> currently NutchServer uses default config and it could be let to set an 
> active configuration id when startup a NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use

2016-08-20 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2303:


 Summary: NutchServer Could Be Able To Select a Configuration to Use
 Key: NUTCH-2303
 URL: https://issues.apache.org/jira/browse/NUTCH-2303
 Project: Nutch
  Issue Type: Improvement
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


RAMConfManager is intented to hold different configurations. However, currently 
NutchServer uses default config and it could be let to set an active 
configuration id when startup a NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use

2016-08-20 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2303 started by Furkan KAMACI.

> NutchServer Could Be Able To Select a Configuration to Use
> --
>
> Key: NUTCH-2303
> URL: https://issues.apache.org/jira/browse/NUTCH-2303
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations. However, 
> currently NutchServer uses default config and it could be let to set an 
> active configuration id when startup a NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2302) RAMConfManager Could Be Constructed With Custom Configuration

2016-08-20 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2302:
-
Summary: RAMConfManager Could Be Constructed With Custom Configuration   
(was: RAMConfManager Should Be Constructed With Custom Configuration )

> RAMConfManager Could Be Constructed With Custom Configuration 
> --
>
> Key: NUTCH-2302
> URL: https://issues.apache.org/jira/browse/NUTCH-2302
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> RAMConfManager is intented to hold different configurations which can be 
> accessible via a configuration id. However, it forces you to use a default 
> configuration with a default id when you construct it. When RAMConfManager is 
> used by any other classes they cannot set a custom configuration and it leads 
> problem. i.e. test resources cannot be used when you test NutchServer due to 
> it uses default configuration which is forced by RAMConfManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2302) RAMConfManager Should Be Constructed With Custom Configuration

2016-08-20 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2302:


 Summary: RAMConfManager Should Be Constructed With Custom 
Configuration 
 Key: NUTCH-2302
 URL: https://issues.apache.org/jira/browse/NUTCH-2302
 Project: Nutch
  Issue Type: Improvement
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


RAMConfManager is intented to hold different configurations which can be 
accessible via a configuration id. However, it forces you to use a default 
configuration with a default id when you construct it. When RAMConfManager is 
used by any other classes they cannot set a custom configuration and it leads 
problem. i.e. test resources cannot be used when you test NutchServer due to it 
uses default configuration which is forced by RAMConfManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2301) Create Tests for Security Layer of NutchServer

2016-08-19 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2301:


 Summary: Create Tests for Security Layer of NutchServer
 Key: NUTCH-2301
 URL: https://issues.apache.org/jira/browse/NUTCH-2301
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


Create tests for security layer of NutchServer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1756) Security layer for NutchServer

2016-08-19 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428900#comment-15428900
 ] 

Furkan KAMACI commented on NUTCH-1756:
--

[~lewismc] I've fixed all your comments at 
https://github.com/apache/nutch/pull/142 could you check them?

> Security layer for NutchServer
> --
>
> Key: NUTCH-1756
> URL: https://issues.apache.org/jira/browse/NUTCH-1756
> Project: Nutch
>  Issue Type: Improvement
>  Components: REST_api, web gui
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
>Priority: Critical
>  Labels: gsoc2016
> Fix For: 2.5
>
>
> It will be beneficial to have a security layer for NutchServer once we make 
> improvements upon it. I hope that GSoC goes ahead this year so we can tackle 
> such issues.
> This issue should implement a standard security layer for REST API calls. It 
> should also add/expose this functionality through the WebApp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2294) Authorization Support for REST API

2016-07-18 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2294:


 Summary: Authorization Support for REST API
 Key: NUTCH-2294
 URL: https://issues.apache.org/jira/browse/NUTCH-2294
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.4


Add authorization for Nutch REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2289) SSL Support for REST API

2016-06-26 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2289:
-
Summary: SSL Support for REST API  (was: SSL Authentication Support for 
REST API)

> SSL Support for REST API
> 
>
> Key: NUTCH-2289
> URL: https://issues.apache.org/jira/browse/NUTCH-2289
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
>
> Add SSL Authentication for Nutch REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2289) SSL Authentication Support for REST API

2016-06-26 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2289:


 Summary: SSL Authentication Support for REST API
 Key: NUTCH-2289
 URL: https://issues.apache.org/jira/browse/NUTCH-2289
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.5


Add SSL Authentication for Nutch REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2288) Upgrade Restlet to 2.3.7

2016-06-26 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2288:


 Summary: Upgrade Restlet to 2.3.7
 Key: NUTCH-2288
 URL: https://issues.apache.org/jira/browse/NUTCH-2288
 Project: Nutch
  Issue Type: Improvement
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.5


Currently we use restlet 2.2.3. We should upgrade restlet to 2.3.7. Changes can 
be seen at here: 
https://restlet.com/technical-resources/restlet-framework/misc/2.3/changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2022) Investigate better documentation for the Nutch REST API's

2016-06-26 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350145#comment-15350145
 ] 

Furkan KAMACI commented on NUTCH-2022:
--

[~lewismc] we have NUTCH-1800 and NUTCH-2243. What differs NUTCH-2022 from 
them? If there is nothing different, seems that it can be closed as duplicate.

> Investigate better documentation for the Nutch REST API's
> -
>
> Key: NUTCH-2022
> URL: https://issues.apache.org/jira/browse/NUTCH-2022
> Project: Nutch
>  Issue Type: Wish
>  Components: REST_api
>Affects Versions: 2.3, 1.10
>Reporter: Lewis John McGibbney
>
> Over on Apache Tika we use [Miredot|http://www.miredot.com/] for better 
> representation of the Tika REST API.
> Based on recent development on both 1.X and 2.x REST API's, it would be nice 
> to have a better interface for people to see.
> An example of Miredot REST API docs can be seen on [Tika REST API 
> docs|http://tika.apache.org/1.8/miredot/index.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2022) Investigate better documentation for the Nutch REST API's

2016-06-26 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI reassigned NUTCH-2022:


Assignee: Furkan KAMACI

> Investigate better documentation for the Nutch REST API's
> -
>
> Key: NUTCH-2022
> URL: https://issues.apache.org/jira/browse/NUTCH-2022
> Project: Nutch
>  Issue Type: Wish
>  Components: REST_api
>Affects Versions: 2.3, 1.10
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
>
> Over on Apache Tika we use [Miredot|http://www.miredot.com/] for better 
> representation of the Tika REST API.
> Based on recent development on both 1.X and 2.x REST API's, it would be nice 
> to have a better interface for people to see.
> An example of Miredot REST API docs can be seen on [Tika REST API 
> docs|http://tika.apache.org/1.8/miredot/index.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2199) Documentation for Nutch 2.X REST API

2016-06-21 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342644#comment-15342644
 ] 

Furkan KAMACI commented on NUTCH-2199:
--

This issue can be closed due to it is duplicated of NUTCH-2243

> Documentation for Nutch 2.X REST API
> 
>
> Key: NUTCH-2199
> URL: https://issues.apache.org/jira/browse/NUTCH-2199
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Affects Versions: 2.3.1
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.5
>
>
> The work done on NUTCH-1800 needs to be ported to 2.X branch. This is 
> trivial, I thought I had already done it but obviously not. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2285) Digest Authentication Support for REST API

2016-06-19 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2285:


 Summary: Digest Authentication Support for REST API
 Key: NUTCH-2285
 URL: https://issues.apache.org/jira/browse/NUTCH-2285
 Project: Nutch
  Issue Type: Sub-task
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.5






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2284) Basic Authentication Support for REST API

2016-06-19 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2284:


 Summary: Basic Authentication Support for REST API
 Key: NUTCH-2284
 URL: https://issues.apache.org/jira/browse/NUTCH-2284
 Project: Nutch
  Issue Type: New Feature
  Components: REST_api, web gui
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
 Fix For: 2.5


Add Basic Authentication for Nutch REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2284) Basic Authentication Support for REST API

2016-06-19 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2284:
-
Issue Type: Sub-task  (was: New Feature)
Parent: NUTCH-1756

> Basic Authentication Support for REST API
> -
>
> Key: NUTCH-2284
> URL: https://issues.apache.org/jira/browse/NUTCH-2284
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
>
> Add Basic Authentication for Nutch REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-1800) Documentation for Nutch 1.X REST API

2016-06-19 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338608#comment-15338608
 ] 

Furkan KAMACI commented on NUTCH-1800:
--

[~lewismc] when I switch to master branch and try to generate the REST API I 
get error:

{code}
furkan@kamaci:~/projects/gsoc2016/nutch$ ant -lib ivy restdocs
Buildfile: /home/furkan/projects/gsoc2016/nutch/build.xml
Trying to override old definition of task javac
  [taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. 
It could not be found.

restdocs:
[ivy:makepom] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ ::
[ivy:makepom] :: loading settings :: url = 
jar:file:/home/furkan/projects/gsoc2016/nutch/ivy/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
[artifact:mvn] + Error stacktraces are turned on.
[artifact:mvn] [INFO] Scanning for projects...
[artifact:mvn] [INFO] 

[artifact:mvn] [INFO] Building Apache Nutch
[artifact:mvn] [INFO]task-segment: [test]
[artifact:mvn] [INFO] 

[artifact:mvn] [INFO] [resources:resources]
[artifact:mvn] [WARNING] Using platform encoding (UTF-8 actually) to copy 
filtered resources, i.e. build is platform dependent!
[artifact:mvn] [INFO] skip non existing resourceDirectory 
/home/furkan/projects/gsoc2016/nutch/src/main/resources
[artifact:mvn] Downloading: 
http://repo1.maven.org/maven2/org/apache/mrunit/mrunit/1.1.0/mrunit-1.1.0.jar
[artifact:mvn] [INFO] 

[artifact:mvn] [ERROR] BUILD ERROR
[artifact:mvn] [INFO] 

[artifact:mvn] [INFO] Failed to resolve artifact.
[artifact:mvn] 
[artifact:mvn] Missing:
[artifact:mvn] --
[artifact:mvn] 1) org.apache.mrunit:mrunit:jar:1.1.0
[artifact:mvn] 
[artifact:mvn]   Try downloading the file manually from the project website.
[artifact:mvn] 
[artifact:mvn]   Then, install it using the command: 
[artifact:mvn]   mvn install:install-file -DgroupId=org.apache.mrunit 
-DartifactId=mrunit -Dversion=1.1.0 -Dpackaging=jar -Dfile=/path/to/file
[artifact:mvn] 
[artifact:mvn]   Alternatively, if you host your own repository you can deploy 
the file there: 
[artifact:mvn]   mvn deploy:deploy-file -DgroupId=org.apache.mrunit 
-DartifactId=mrunit -Dversion=1.1.0 -Dpackaging=jar -Dfile=/path/to/file 
-Durl=[url] -DrepositoryId=[id]
[artifact:mvn] 
[artifact:mvn]   Path to dependency: 
[artifact:mvn]  1) org.apache.nutch:nutch:jar:1.12-SNAPSHOT
[artifact:mvn]  2) org.apache.mrunit:mrunit:jar:1.1.0
[artifact:mvn] 
[artifact:mvn] --
[artifact:mvn] 1 required artifact is missing.
[artifact:mvn] 
[artifact:mvn] for artifact: 
[artifact:mvn]   org.apache.nutch:nutch:jar:1.12-SNAPSHOT
[artifact:mvn] 
[artifact:mvn] from the specified remote repositories:
[artifact:mvn]   central (http://repo1.maven.org/maven2)
[artifact:mvn] 
[artifact:mvn] 
[artifact:mvn] 
[artifact:mvn] [INFO] 

[artifact:mvn] [INFO] Trace
[artifact:mvn] org.apache.maven.lifecycle.LifecycleExecutionException: Missing:
[artifact:mvn] --
[artifact:mvn] 1) org.apache.mrunit:mrunit:jar:1.1.0
[artifact:mvn] 
[artifact:mvn]   Try downloading the file manually from the project website.
[artifact:mvn] 
[artifact:mvn]   Then, install it using the command: 
[artifact:mvn]   mvn install:install-file -DgroupId=org.apache.mrunit 
-DartifactId=mrunit -Dversion=1.1.0 -Dpackaging=jar -Dfile=/path/to/file
[artifact:mvn] 
[artifact:mvn]   Alternatively, if you host your own repository you can deploy 
the file there: 
[artifact:mvn]   mvn deploy:deploy-file -DgroupId=org.apache.mrunit 
-DartifactId=mrunit -Dversion=1.1.0 -Dpackaging=jar -Dfile=/path/to/file 
-Durl=[url] -DrepositoryId=[id]
[artifact:mvn] 
[artifact:mvn]   Path to dependency: 
[artifact:mvn]  1) org.apache.nutch:nutch:jar:1.12-SNAPSHOT
[artifact:mvn]  2) org.apache.mrunit:mrunit:jar:1.1.0
[artifact:mvn] 
[artifact:mvn] --
[artifact:mvn] 1 required artifact is missing.
[artifact:mvn] 
[artifact:mvn] for artifact: 
[artifact:mvn]   org.apache.nutch:nutch:jar:1.12-SNAPSHOT
[artifact:mvn] 
[artifact:mvn] from the specified remote repositories:
[artifact:mvn]   central (http://repo1.maven.org/maven2)
[artifact:mvn] 
[artifact:mvn] 
[artifact:mvn]  at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:576)
[artifact:mvn]  at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:500)
[artifact:mvn]  at 
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:479)
[artifact:mvn]  at 

[jira] [Commented] (NUTCH-2267) Solr indexer fails at the end of the job with a java error message

2016-06-01 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310866#comment-15310866
 ] 

Furkan KAMACI commented on NUTCH-2267:
--

We support Solr 5.4.1 
(https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/ivy.xm) 
So, this is the reason of the error described at here. [~lewismc] issue should 
be closed for that reason.

> Solr indexer fails at the end of the job with a java error message
> --
>
> Key: NUTCH-2267
> URL: https://issues.apache.org/jira/browse/NUTCH-2267
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: hadoop v2.7.2  solr6 in cloud configuration with 
> zookeeper 3.4.6. I use the master branch from github currently on commit 
> da252eb7b3d2d7b70   ( NUTCH - 2263 mingram and maxgram support for Unigram 
> Cosine Similarity Model is provided. )
>Reporter: kaveh minooie
> Fix For: 1.13
>
>
> this is was what I was getting first:
> 16/05/23 13:52:27 INFO mapreduce.Job:  map 100% reduce 100%
> 16/05/23 13:52:27 INFO mapreduce.Job: Task Id : 
> attempt_1462499602101_0119_r_00_0, Status : FAILED
> Error: Bad return type
> Exception Details:
>   Location:
> org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient;
>  @58: areturn
>   Reason:
> Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, 
> stack[0]) is not assignable to 
> 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
>   Current Frame:
> bci: @58
> flags: { }
> locals: { 'org/apache/solr/common/params/SolrParams', 
> 'org/apache/http/conn/ClientConnectionManager', 
> 'org/apache/solr/common/params/ModifiableSolrParams', 
> 'org/apache/http/impl/client/DefaultHttpClient' }
> stack: { 'org/apache/http/impl/client/DefaultHttpClient' }
>   Bytecode:
> 0x000: bb00 0359 2ab7 0004 4db2 0005 b900 0601
> 0x010: 0099 001e b200 05bb 0007 59b7 0008 1209
> 0x020: b600 0a2c b600 0bb6 000c b900 0d02 002b
> 0x030: b800 104e 2d2c b800 0f2d b0
>   Stackmap Table:
> append_frame(@47,Object[#143])
> 16/05/23 13:52:28 INFO mapreduce.Job:  map 100% reduce 0% 
> as you can see the failed reducer gets re-spawned. then I found this issue: 
> https://issues.apache.org/jira/browse/SOLR-7657 and I updated my hadoop 
> config file. after that, the indexer seems to be able to finish ( I got the 
> document in the solr, it seems ) but I still get the error message at the end 
> of the job:
> 16/05/23 16:39:26 INFO mapreduce.Job:  map 100% reduce 99%
> 16/05/23 16:39:44 INFO mapreduce.Job:  map 100% reduce 100%
> 16/05/23 16:39:57 INFO mapreduce.Job: Job job_1464045047943_0001 completed 
> successfully
> 16/05/23 16:39:58 INFO mapreduce.Job: Counters: 53
>   File System Counters
>   FILE: Number of bytes read=42700154855
>   FILE: Number of bytes written=70210771807
>   FILE: Number of read operations=0
>   FILE: Number of large read operations=0
>   FILE: Number of write operations=0
>   HDFS: Number of bytes read=8699202825
>   HDFS: Number of bytes written=0
>   HDFS: Number of read operations=537
>   HDFS: Number of large read operations=0
>   HDFS: Number of write operations=0
>   Job Counters 
>   Launched map tasks=134
>   Launched reduce tasks=1
>   Data-local map tasks=107
>   Rack-local map tasks=27
>   Total time spent by all maps in occupied slots (ms)=49377664
>   Total time spent by all reduces in occupied slots (ms)=32765064
>   Total time spent by all map tasks (ms)=3086104
>   Total time spent by all reduce tasks (ms)=1365211
>   Total vcore-milliseconds taken by all map tasks=3086104
>   Total vcore-milliseconds taken by all reduce tasks=1365211
>   Total megabyte-milliseconds taken by all map tasks=12640681984
>   Total megabyte-milliseconds taken by all reduce tasks=8387856384
>   Map-Reduce Framework
>   Map input records=25305474
>   Map output records=25305474
>   Map output bytes=27422869763
>   Map output materialized bytes=27489888004
>   Input split bytes=15225
>   Combine input records=0
>   Combine output records=0
>   Reduce input groups=16061459
>   Reduce shuffle bytes=27489888004
>   Reduce input records=25305474
>   Reduce output records=230
>   Spilled 

[jira] [Commented] (NUTCH-2271) Solr indexer Failed

2016-06-01 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310862#comment-15310862
 ] 

Furkan KAMACI commented on NUTCH-2271:
--

[~lewismc] OK, I found it at ivy.xml of related plugin: 
https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/ivy.xml. We 
support 5.4.1. So, this is the reason of the error had got here. Issue should 
be closed for that reason.

> Solr indexer Failed 
> 
>
> Key: NUTCH-2271
> URL: https://issues.apache.org/jira/browse/NUTCH-2271
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Hadoop 2.7.2 , Solr 6.0.0 , Nutch 1.12 on Single node 
>Reporter: narendra
>Assignee: Furkan KAMACI
>
> When i run this command
>   bin/nutch solrindex http://localhost:8983/solr/#/devel1 crawl_Test1/crawldb 
> -linkdb crawl_Test1/linkdb  crawl_Test1/segments/*
> 16/05/31 22:21:47 WARN segment.SegmentChecker: The input path at * is not a 
> segment... skipping
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: starting at 2016-05-31 
> 22:21:47
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: deleting gone documents: 
> false
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL filtering: false
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL normalizing: false
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugins: looking in: 
> /tmp/hadoop-unjar8621976524622577403/classes/plugins
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
> [true]
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Plugins:
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Filter 
> (urlfilter-regex)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Html Parse Plug-in 
> (parse-html)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   HTTP Framework 
> (lib-http)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   the nutch core 
> extension points (nutch-extensionpoints)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Basic Indexing Filter 
> (index-basic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Anchor Indexing Filter 
> (index-anchor)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Tika Parser Plug-in 
> (parse-tika)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Basic URL Normalizer 
> (urlnormalizer-basic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Filter 
> Framework (lib-regex-filter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Normalizer 
> (urlnormalizer-regex)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   CyberNeko HTML Parser 
> (lib-nekohtml)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   OPIC Scoring Plug-in 
> (scoring-opic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Pass-through URL 
> Normalizer (urlnormalizer-pass)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Http Protocol Plug-in 
> (protocol-http)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   SolrIndexWriter 
> (indexer-solr)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Extension-Points:
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Content Parser 
> (org.apache.nutch.parse.Parser)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Filter 
> (org.apache.nutch.net.URLFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   HTML Parse Filter 
> (org.apache.nutch.parse.HtmlParseFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Normalizer 
> (org.apache.nutch.net.URLNormalizer)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Protocol 
> (org.apache.nutch.protocol.Protocol)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Ignore 
> Exemption Filter (org.apache.nutch.net.URLExemptionFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Index Writer 
> (org.apache.nutch.indexer.IndexWriter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Segment Merge 
> Filter (org.apache.nutch.segment.SegmentMergeFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Indexing Filter 
> (org.apache.nutch.indexer.IndexingFilter)
> 16/05/31 22:21:47 INFO indexer.IndexWriters: Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Active IndexWriters :
> SOLRIndexWriter
>   solr.server.url : URL of the SOLR instance
>   solr.zookeeper.hosts : URL of the Zookeeper quorum
>   solr.commit.size : buffer size when sending to SOLR (default 1000)
>   solr.mapping.file : name of the mapping file for fields (default 
> solrindex-mapping.xml)
>   

[jira] [Commented] (NUTCH-2268) SolrIndexerJob: java.lang.RuntimeException

2016-06-01 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309915#comment-15309915
 ] 

Furkan KAMACI commented on NUTCH-2268:
--

[~kadarinaren...@gmail.com] When you tried with Solr 4.10.3 did you changer 
other environment elements' versions or not?

> SolrIndexerJob: java.lang.RuntimeException
> --
>
> Key: NUTCH-2268
> URL: https://issues.apache.org/jira/browse/NUTCH-2268
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 2.3.1
> Environment: iam using 
> Hbase V:hbase-0.98.19-hadoop2
> Solr V : 6.0.0
> Nutch : 2.3.1
> java : 8
>Reporter: narendra
>  Labels: indexing
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Could you please help out of this error 
> SolrIndexerJob: java.lang.RuntimeException: job 
> failed:name=apache-nutch-2.3.1.jar   
> when i run this commend 
> local/bin/nutch solrindex http://localhost:8983/solr/ -all
> Tried with Solr 4.10.3 but same error iam getting 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (NUTCH-2268) SolrIndexerJob: java.lang.RuntimeException

2016-06-01 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309915#comment-15309915
 ] 

Furkan KAMACI edited comment on NUTCH-2268 at 6/1/16 8:19 AM:
--

[~kadarinaren...@gmail.com] when you tried with Solr 4.10.3 did you changer 
other environment elements' versions or not?


was (Author: kamaci):
[~kadarinaren...@gmail.com] When you tried with Solr 4.10.3 did you changer 
other environment elements' versions or not?

> SolrIndexerJob: java.lang.RuntimeException
> --
>
> Key: NUTCH-2268
> URL: https://issues.apache.org/jira/browse/NUTCH-2268
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 2.3.1
> Environment: iam using 
> Hbase V:hbase-0.98.19-hadoop2
> Solr V : 6.0.0
> Nutch : 2.3.1
> java : 8
>Reporter: narendra
>  Labels: indexing
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Could you please help out of this error 
> SolrIndexerJob: java.lang.RuntimeException: job 
> failed:name=apache-nutch-2.3.1.jar   
> when i run this commend 
> local/bin/nutch solrindex http://localhost:8983/solr/ -all
> Tried with Solr 4.10.3 but same error iam getting 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2270) Solr indexer Failed i

2016-06-01 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309907#comment-15309907
 ] 

Furkan KAMACI commented on NUTCH-2270:
--

This issue is duplicated of NUTCH-2271 and should be closed.

> Solr indexer Failed i
> -
>
> Key: NUTCH-2270
> URL: https://issues.apache.org/jira/browse/NUTCH-2270
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Hadoop 2.7.2 , Solr 6.0.0 , Nutch 1.12 on Single node 
>Reporter: narendra
>
> When i run this command
>  bin/nutch solrindex http://localhost:8983/solr/#/gettingstarted 
> crawl_Test1/crawldb -linkdb crawl_Test1/linkdb  crawl_Test1/segments/*
> 16/05/31 22:21:47 WARN segment.SegmentChecker: The input path at * is not a 
> segment... skipping
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: starting at 2016-05-31 
> 22:21:47
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: deleting gone documents: 
> false
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL filtering: false
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL normalizing: false
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugins: looking in: 
> /tmp/hadoop-unjar8621976524622577403/classes/plugins
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
> [true]
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Plugins:
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Filter 
> (urlfilter-regex)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Html Parse Plug-in 
> (parse-html)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   HTTP Framework 
> (lib-http)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   the nutch core 
> extension points (nutch-extensionpoints)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Basic Indexing Filter 
> (index-basic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Anchor Indexing Filter 
> (index-anchor)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Tika Parser Plug-in 
> (parse-tika)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Basic URL Normalizer 
> (urlnormalizer-basic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Filter 
> Framework (lib-regex-filter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Normalizer 
> (urlnormalizer-regex)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   CyberNeko HTML Parser 
> (lib-nekohtml)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   OPIC Scoring Plug-in 
> (scoring-opic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Pass-through URL 
> Normalizer (urlnormalizer-pass)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Http Protocol Plug-in 
> (protocol-http)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   SolrIndexWriter 
> (indexer-solr)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Extension-Points:
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Content Parser 
> (org.apache.nutch.parse.Parser)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Filter 
> (org.apache.nutch.net.URLFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   HTML Parse Filter 
> (org.apache.nutch.parse.HtmlParseFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Normalizer 
> (org.apache.nutch.net.URLNormalizer)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Protocol 
> (org.apache.nutch.protocol.Protocol)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Ignore 
> Exemption Filter (org.apache.nutch.net.URLExemptionFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Index Writer 
> (org.apache.nutch.indexer.IndexWriter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Segment Merge 
> Filter (org.apache.nutch.segment.SegmentMergeFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Indexing Filter 
> (org.apache.nutch.indexer.IndexingFilter)
> 16/05/31 22:21:47 INFO indexer.IndexWriters: Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Active IndexWriters :
> SOLRIndexWriter
>   solr.server.url : URL of the SOLR instance
>   solr.zookeeper.hosts : URL of the Zookeeper quorum
>   solr.commit.size : buffer size when sending to SOLR (default 1000)
>   solr.mapping.file : name of the mapping file for fields (default 
> solrindex-mapping.xml)
>   solr.auth : use authentication (default false)
>   solr.auth.username : username for authentication
>   solr.auth.password : password for authentication
> 16/05/31 22:21:47 INFO indexer.IndexerMapReduce: 

[jira] [Commented] (NUTCH-2271) Solr indexer Failed

2016-06-01 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309899#comment-15309899
 ] 

Furkan KAMACI commented on NUTCH-2271:
--

This seems to be related to SOLR-7948. [~lewismc] do we support Solr 6 at this 
version of Nutch?

> Solr indexer Failed 
> 
>
> Key: NUTCH-2271
> URL: https://issues.apache.org/jira/browse/NUTCH-2271
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Hadoop 2.7.2 , Solr 6.0.0 , Nutch 1.12 on Single node 
>Reporter: narendra
>Assignee: Furkan KAMACI
>
> When i run this command
>   bin/nutch solrindex http://localhost:8983/solr/#/devel1 crawl_Test1/crawldb 
> -linkdb crawl_Test1/linkdb  crawl_Test1/segments/*
> 16/05/31 22:21:47 WARN segment.SegmentChecker: The input path at * is not a 
> segment... skipping
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: starting at 2016-05-31 
> 22:21:47
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: deleting gone documents: 
> false
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL filtering: false
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL normalizing: false
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugins: looking in: 
> /tmp/hadoop-unjar8621976524622577403/classes/plugins
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
> [true]
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Plugins:
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Filter 
> (urlfilter-regex)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Html Parse Plug-in 
> (parse-html)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   HTTP Framework 
> (lib-http)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   the nutch core 
> extension points (nutch-extensionpoints)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Basic Indexing Filter 
> (index-basic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Anchor Indexing Filter 
> (index-anchor)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Tika Parser Plug-in 
> (parse-tika)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Basic URL Normalizer 
> (urlnormalizer-basic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Filter 
> Framework (lib-regex-filter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Normalizer 
> (urlnormalizer-regex)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   CyberNeko HTML Parser 
> (lib-nekohtml)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   OPIC Scoring Plug-in 
> (scoring-opic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Pass-through URL 
> Normalizer (urlnormalizer-pass)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Http Protocol Plug-in 
> (protocol-http)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   SolrIndexWriter 
> (indexer-solr)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Extension-Points:
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Content Parser 
> (org.apache.nutch.parse.Parser)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Filter 
> (org.apache.nutch.net.URLFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   HTML Parse Filter 
> (org.apache.nutch.parse.HtmlParseFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Normalizer 
> (org.apache.nutch.net.URLNormalizer)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Protocol 
> (org.apache.nutch.protocol.Protocol)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Ignore 
> Exemption Filter (org.apache.nutch.net.URLExemptionFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Index Writer 
> (org.apache.nutch.indexer.IndexWriter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Segment Merge 
> Filter (org.apache.nutch.segment.SegmentMergeFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Indexing Filter 
> (org.apache.nutch.indexer.IndexingFilter)
> 16/05/31 22:21:47 INFO indexer.IndexWriters: Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Active IndexWriters :
> SOLRIndexWriter
>   solr.server.url : URL of the SOLR instance
>   solr.zookeeper.hosts : URL of the Zookeeper quorum
>   solr.commit.size : buffer size when sending to SOLR (default 1000)
>   solr.mapping.file : name of the mapping file for fields (default 
> solrindex-mapping.xml)
>   solr.auth : use authentication (default false)
>   solr.auth.username : username for authentication
>   solr.auth.password : password for 

[jira] [Assigned] (NUTCH-2271) Solr indexer Failed

2016-06-01 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI reassigned NUTCH-2271:


Assignee: Furkan KAMACI

> Solr indexer Failed 
> 
>
> Key: NUTCH-2271
> URL: https://issues.apache.org/jira/browse/NUTCH-2271
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Hadoop 2.7.2 , Solr 6.0.0 , Nutch 1.12 on Single node 
>Reporter: narendra
>Assignee: Furkan KAMACI
>
> When i run this command
>   bin/nutch solrindex http://localhost:8983/solr/#/devel1 crawl_Test1/crawldb 
> -linkdb crawl_Test1/linkdb  crawl_Test1/segments/*
> 16/05/31 22:21:47 WARN segment.SegmentChecker: The input path at * is not a 
> segment... skipping
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: starting at 2016-05-31 
> 22:21:47
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: deleting gone documents: 
> false
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL filtering: false
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL normalizing: false
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugins: looking in: 
> /tmp/hadoop-unjar8621976524622577403/classes/plugins
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
> [true]
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Plugins:
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Filter 
> (urlfilter-regex)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Html Parse Plug-in 
> (parse-html)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   HTTP Framework 
> (lib-http)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   the nutch core 
> extension points (nutch-extensionpoints)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Basic Indexing Filter 
> (index-basic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Anchor Indexing Filter 
> (index-anchor)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Tika Parser Plug-in 
> (parse-tika)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Basic URL Normalizer 
> (urlnormalizer-basic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Filter 
> Framework (lib-regex-filter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Regex URL Normalizer 
> (urlnormalizer-regex)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   CyberNeko HTML Parser 
> (lib-nekohtml)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   OPIC Scoring Plug-in 
> (scoring-opic)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Pass-through URL 
> Normalizer (urlnormalizer-pass)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Http Protocol Plug-in 
> (protocol-http)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   SolrIndexWriter 
> (indexer-solr)
> 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Extension-Points:
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Content Parser 
> (org.apache.nutch.parse.Parser)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Filter 
> (org.apache.nutch.net.URLFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   HTML Parse Filter 
> (org.apache.nutch.parse.HtmlParseFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Scoring 
> (org.apache.nutch.scoring.ScoringFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Normalizer 
> (org.apache.nutch.net.URLNormalizer)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Protocol 
> (org.apache.nutch.protocol.Protocol)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch URL Ignore 
> Exemption Filter (org.apache.nutch.net.URLExemptionFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Index Writer 
> (org.apache.nutch.indexer.IndexWriter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Segment Merge 
> Filter (org.apache.nutch.segment.SegmentMergeFilter)
> 16/05/31 22:21:47 INFO plugin.PluginRepository:   Nutch Indexing Filter 
> (org.apache.nutch.indexer.IndexingFilter)
> 16/05/31 22:21:47 INFO indexer.IndexWriters: Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 16/05/31 22:21:47 INFO indexer.IndexingJob: Active IndexWriters :
> SOLRIndexWriter
>   solr.server.url : URL of the SOLR instance
>   solr.zookeeper.hosts : URL of the Zookeeper quorum
>   solr.commit.size : buffer size when sending to SOLR (default 1000)
>   solr.mapping.file : name of the mapping file for fields (default 
> solrindex-mapping.xml)
>   solr.auth : use authentication (default false)
>   solr.auth.username : username for authentication
>   solr.auth.password : password for authentication
> 16/05/31 22:21:47 INFO indexer.IndexerMapReduce: IndexerMapReduce: crawldb: 
> 

[jira] [Updated] (NUTCH-1800) Documentation for Nutch 1.X REST API

2016-05-24 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-1800:
-
Description: 
This issue should build on NUTCH-1769 with full Java documentation for all 
classes in the following packages

org.apache.nutch.api.*

I am assigning this one to [~fjodor.vershinin] as he is doing an excellent job 
on the REST API. His UML graphic in [0] and commantary shows that he has a good 
understanding of the REST API and its functionality.

Thank you [~fjodor.vershinin] great work.

[0] https://wiki.apache.org/nutch/NutchRESTAPI#UML_Graphic

  was:
This issue should build on NUTCH-1769 with full Java documentation for all 
classes in the following packages

org.apache.nutch.api.*

I am assigning this one to [~fjodor.vershinin] as he is doing an excellent job 
on the REST API. His UML graphic in [0] and commantary shows that he has a goo 
dunderstanding of the REST API and its functionality.

Thank you [~fjodor.vershinin] great work.

[0] https://wiki.apache.org/nutch/NutchRESTAPI#UML_Graphic


> Documentation for Nutch 1.X REST API
> 
>
> Key: NUTCH-1800
> URL: https://issues.apache.org/jira/browse/NUTCH-1800
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.11
>
> Attachments: NUTCH-1800.patch
>
>
> This issue should build on NUTCH-1769 with full Java documentation for all 
> classes in the following packages
> org.apache.nutch.api.*
> I am assigning this one to [~fjodor.vershinin] as he is doing an excellent 
> job on the REST API. His UML graphic in [0] and commantary shows that he has 
> a good understanding of the REST API and its functionality.
> Thank you [~fjodor.vershinin] great work.
> [0] https://wiki.apache.org/nutch/NutchRESTAPI#UML_Graphic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (NUTCH-2266) Fix dead link in build.xml for javadoc

2016-05-23 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297121#comment-15297121
 ] 

Furkan KAMACI edited comment on NUTCH-2266 at 5/23/16 9:41 PM:
---

javadoc link for lucene is removed and javadoc links for solr-solrj, 
lucene.core and lucene.analyzers-common are created.


was (Author: kamaci):
javadoc links for solr-solrj, lucene.core and lucene.analyzers-common are 
created.

> Fix dead link in build.xml for javadoc
> --
>
> Key: NUTCH-2266
> URL: https://issues.apache.org/jira/browse/NUTCH-2266
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.5
>
>
> build.xml has a dead link for javadoc.link.lucene and should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2266) Fix dead link in build.xml for javadoc

2016-05-23 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297121#comment-15297121
 ] 

Furkan KAMACI commented on NUTCH-2266:
--

javadoc links for solr-solrj, lucene.core and lucene.analyzers-common are 
created.

> Fix dead link in build.xml for javadoc
> --
>
> Key: NUTCH-2266
> URL: https://issues.apache.org/jira/browse/NUTCH-2266
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.5
>
>
> build.xml has a dead link for javadoc.link.lucene and should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (NUTCH-2266) Fix dead link in build.xml for javadoc

2016-05-23 Thread Furkan KAMACI (JIRA)
Furkan KAMACI created NUTCH-2266:


 Summary: Fix dead link in build.xml for javadoc
 Key: NUTCH-2266
 URL: https://issues.apache.org/jira/browse/NUTCH-2266
 Project: Nutch
  Issue Type: Bug
  Components: build
Reporter: Furkan KAMACI
Assignee: Furkan KAMACI
Priority: Minor
 Fix For: 2.5


build.xml has a dead link for javadoc.link.lucene and should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2089) Move Nutch to compile on JDK 8

2016-05-23 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296997#comment-15296997
 ] 

Furkan KAMACI commented on NUTCH-2089:
--

Changed javadoc.link.java too and attached the output. Seems same.

> Move Nutch to compile on JDK 8
> --
>
> Key: NUTCH-2089
> URL: https://issues.apache.org/jira/browse/NUTCH-2089
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
> Attachments: java8output.txt, java8output.txt
>
>
> Public support updates for JDK 1.7 stopped in April of this year.
> https://www.java.com/en/download/faq/java_7.xml
> In our next release we should shift support to JDK 1.8.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2089) Move Nutch to compile on JDK 8

2016-05-23 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2089:
-
Attachment: java8output.txt

> Move Nutch to compile on JDK 8
> --
>
> Key: NUTCH-2089
> URL: https://issues.apache.org/jira/browse/NUTCH-2089
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
> Attachments: java8output.txt, java8output.txt
>
>
> Public support updates for JDK 1.7 stopped in April of this year.
> https://www.java.com/en/download/faq/java_7.xml
> In our next release we should shift support to JDK 1.8.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2089) Move Nutch to compile on JDK 8

2016-05-23 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296983#comment-15296983
 ] 

Furkan KAMACI commented on NUTCH-2089:
--

I've changed javac.version to 1.8 at build.xml (also, my Java version is 
1.8.0_77). debug is on at build.xml. I've attached the output.[~lewismc] how 
can I see warnings?

> Move Nutch to compile on JDK 8
> --
>
> Key: NUTCH-2089
> URL: https://issues.apache.org/jira/browse/NUTCH-2089
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
> Attachments: java8output.txt
>
>
> Public support updates for JDK 1.7 stopped in April of this year.
> https://www.java.com/en/download/faq/java_7.xml
> In our next release we should shift support to JDK 1.8.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2089) Move Nutch to compile on JDK 8

2016-05-23 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI updated NUTCH-2089:
-
Attachment: java8output.txt

> Move Nutch to compile on JDK 8
> --
>
> Key: NUTCH-2089
> URL: https://issues.apache.org/jira/browse/NUTCH-2089
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
> Attachments: java8output.txt
>
>
> Public support updates for JDK 1.7 stopped in April of this year.
> https://www.java.com/en/download/faq/java_7.xml
> In our next release we should shift support to JDK 1.8.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2089) Move Nutch to compile on JDK 8

2016-05-23 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI reassigned NUTCH-2089:


Assignee: Furkan KAMACI

> Move Nutch to compile on JDK 8
> --
>
> Key: NUTCH-2089
> URL: https://issues.apache.org/jira/browse/NUTCH-2089
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
>
> Public support updates for JDK 1.7 stopped in April of this year.
> https://www.java.com/en/download/faq/java_7.xml
> In our next release we should shift support to JDK 1.8.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-05-22 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295670#comment-15295670
 ] 

Furkan KAMACI commented on NUTCH-:
--

[~abenjell] [~lewismc] I coulnd't reproduce the issue? Could you guide me?

> re-fetch deletes all  metadata except _csh_ and _rs_
> 
>
> Key: NUTCH-
> URL: https://issues.apache.org/jira/browse/NUTCH-
> Project: Nutch
>  Issue Type: Bug
>  Components: crawldb
>Affects Versions: 2.3.1
> Environment: Centos 6, mongodb 2.6 and mongodb 3.0 and 
> hbase-0.98.8-hadoop2
>Reporter: Adnane B.
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
> Attachments: TestReFetch.java, index.html
>
>
> This problem happens at the the second time I crawl a page
> {code}
> bin/nutch inject urls/
> bin/nutch generate -topN 1000
> bin/nutch fetch  -all
> bin/nutch parse -force   -all
> bin/nutch updatedb  -all
> {code}
> seconde time (re-fetch) : 
> {code}
> bin/nutch generate -topN 1000 --> batchid changes for all existing pages
> bin/nutch fetch  -all   -->  *** metadatas are delete for all pages already 
> crawled  **
> bin/nutch parse -force   -all
> bin/nutch updatedb  -all
> {code}
> I reproduce it with mongodb 2.6, mongodb 3.0, and hbase-0.98.8-hadoop2
> It happens only if the page has not changed
> To reproduce easily, please add to nutch-site.xml :
> {code}
> 
>   db.fetch.interval.default
>   60
>   The default number of seconds between re-fetches of a page (1 
> minute)
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2016-05-21 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI reassigned NUTCH-:


Assignee: Furkan KAMACI  (was: Lewis John McGibbney)

> re-fetch deletes all  metadata except _csh_ and _rs_
> 
>
> Key: NUTCH-
> URL: https://issues.apache.org/jira/browse/NUTCH-
> Project: Nutch
>  Issue Type: Bug
>  Components: crawldb
>Affects Versions: 2.3.1
> Environment: Centos 6, mongodb 2.6 and mongodb 3.0 and 
> hbase-0.98.8-hadoop2
>Reporter: Adnane B.
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
> Attachments: TestReFetch.java, index.html
>
>
> This problem happens at the the second time I crawl a page
> {code}
> bin/nutch inject urls/
> bin/nutch generate -topN 1000
> bin/nutch fetch  -all
> bin/nutch parse -force   -all
> bin/nutch updatedb  -all
> {code}
> seconde time (re-fetch) : 
> {code}
> bin/nutch generate -topN 1000 --> batchid changes for all existing pages
> bin/nutch fetch  -all   -->  *** metadatas are delete for all pages already 
> crawled  **
> bin/nutch parse -force   -all
> bin/nutch updatedb  -all
> {code}
> I reproduce it with mongodb 2.6, mongodb 3.0, and hbase-0.98.8-hadoop2
> It happens only if the page has not changed
> To reproduce easily, please add to nutch-site.xml :
> {code}
> 
>   db.fetch.interval.default
>   60
>   The default number of seconds between re-fetches of a page (1 
> minute)
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2199) Documentation for Nutch 2.X REST API

2016-05-21 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI reassigned NUTCH-2199:


Assignee: Furkan KAMACI

> Documentation for Nutch 2.X REST API
> 
>
> Key: NUTCH-2199
> URL: https://issues.apache.org/jira/browse/NUTCH-2199
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Affects Versions: 2.3.1
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.5
>
>
> The work done on NUTCH-1800 needs to be ported to 2.X branch. This is 
> trivial, I thought I had already done it but obviously not. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2264) Check Forbidden API's at Build

2016-05-21 Thread Furkan KAMACI (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Furkan KAMACI reassigned NUTCH-2264:


Assignee: Furkan KAMACI

> Check Forbidden API's at Build
> --
>
> Key: NUTCH-2264
> URL: https://issues.apache.org/jira/browse/NUTCH-2264
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
>
> We should avoid [forbidden 
> calls|https://github.com/policeman-tools/forbidden-apis/wiki]  and check in 
> the ant build for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >