[jira] [Commented] (NUTCH-2848) Consider use of StringUtil#isEmpty
[ https://issues.apache.org/jira/browse/NUTCH-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280412#comment-17280412 ] Furkan Kamaci commented on NUTCH-2848: -- You are right! However, second method can be error-prone in case of the given string is null? {code:java} public static boolean isEmpty(String str) { return str.length == 0; } {code} On the other hand, we may need to check either a given String has length or null via util class as follows: {code:java} public static boolean hasLength(String str) { return (str != null && str.length() > 0); } {code} We may need to check these files for it: {noformat} grep -lr ".length() > 0" . ./src/test/org/apache/nutch/util/TestSuffixStringMatcher.java ./src/test/org/apache/nutch/util/TestPrefixStringMatcher.java ./src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java ./src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java ./src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestMSWordParser.java ./src/plugin/parse-tika/src/test/org/apache/nutch/parse/tika/TestOOParser.java ./src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMBuilder.java ./src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMContentUtils.java ./src/plugin/protocol-selenium/src/java/org/apache/nutch/protocol/selenium/HttpResponse.java ./src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java ./src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java ./src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java ./src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf/SWFParser.java ./src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMBuilder.java ./src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java ./src/plugin/protocol-htmlunit/src/java/org/apache/nutch/protocol/htmlunit/HttpResponse.java ./src/plugin/index-replace/src/java/org/apache/nutch/indexer/replace/FieldReplacer.java ./src/plugin/index-replace/src/java/org/apache/nutch/indexer/replace/ReplaceIndexer.java ./src/plugin/urlnormalizer-ajax/src/java/org/apache/nutch/net/urlnormalizer/ajax/AjaxURLNormalizer.java ./src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java ./src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java ./src/java/org/apache/nutch/tools/DmozParser.java ./src/java/org/apache/nutch/util/TrieStringMatcher.java ./src/java/org/apache/nutch/util/TableUtil.java ./src/java/org/apache/nutch/plugin/PluginManifestParser.java ./src/java/org/apache/nutch/crawl/TextProfileSignature.java ./src/java/org/apache/nutch/crawl/Injector.java ./src/java/org/apache/nutch/hostdb/HostDatum.java ./src/java/org/apache/nutch/metadata/Metadata.java{noformat} due to there may be different forms which aligns with hasLength() method as like: {code:java} if ((null != data) && (data.trim().length() > 0)) { throw new org.xml.sax.SAXException("Warning: can't output text before document element! Ignoring..."); } {code} [https://github.com/apache/nutch/blob/master/src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/DOMBuilder.java#L158] > Consider use of StringUtil#isEmpty > -- > > Key: NUTCH-2848 > URL: https://issues.apache.org/jira/browse/NUTCH-2848 > Project: Nutch > Issue Type: Improvement > Components: util >Reporter: Lewis John McGibbney >Priority: Minor > Fix For: 1.19 > > > We should consider 'standardizing' the use of > [StringUtil#isEmpty()|https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/util/StringUtil.java#L133-L138] > across the codebase. > {code:java} > /** >* Checks if a string is empty (ie is null or empty). >*/ > public static boolean isEmpty(String str) { > return (str == null) || (str.equals("")); > } > {code} > So far the impact is as follows > {code:bash} > grep -lr ".equals(\"\")" . > ./plugin/urlnormalizer-protocol/src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java > ./plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java > ./plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java > ./plugin/parsefilter-regex/src/java/org/apache/nutch/parsefilter/regex/RegexParseFilter.java > ./plugin/feed/src/java/org/apache/nutch/parse/feed/FeedParser.java > ./plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/Train.java > ./plugin/language-identifier/src/test/org/apache/nutch/analysis/lang/TestHTMLLanguageParser.java > ./plugin/urlnormalizer-slash/src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java >
[jira] [Commented] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8
[ https://issues.apache.org/jira/browse/NUTCH-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832026#comment-15832026 ] Furkan KAMACI commented on NUTCH-2171: -- [~lewismc] I can fix javadocs to as like NUTCH-2089 We can add lambdas except for anonymous classes throughout the next releases after with this improvement. > Upgrade Nutch Trunk to Java 1.8 > --- > > Key: NUTCH-2171 > URL: https://issues.apache.org/jira/browse/NUTCH-2171 > Project: Nutch > Issue Type: Task >Reporter: Lewis John McGibbney > > Lambda expressions are fantastic. I tried to undertake a small exercise which > would indicate how many we could implement however this was a fruitless > effort. A patch is going to be a better approach. This task involves > upgrading various properties in default.properties as well as a systemic > source code analysis with the aim of implementing Java 8 goodies throughout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (NUTCH-2348) Close GZIPInputStream
[ https://issues.apache.org/jira/browse/NUTCH-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI closed NUTCH-2348. Resolution: Won't Fix > Close GZIPInputStream > - > > Key: NUTCH-2348 > URL: https://issues.apache.org/jira/browse/NUTCH-2348 > Project: Nutch > Issue Type: Bug > Components: tool >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > GZIPInputStream is not closed and it should be finally closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2171) Upgrade Nutch Trunk to Java 1.8
[ https://issues.apache.org/jira/browse/NUTCH-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831815#comment-15831815 ] Furkan KAMACI commented on NUTCH-2171: -- [~lewismc] I've analysed the code and there are 9 anonymous classes. These seems to be not important and won't add any gain but I can send a PR. I've also analysed the code for replacing explicit type operators with diamond (Java 7 feature) and there are 193 of them and also there are 9 usage which is not using try with resources feature of Java 7. I can send a such PR and we can change the default.properties and let developers to use Java 8 features for upcoming contributions to Nutch source code? > Upgrade Nutch Trunk to Java 1.8 > --- > > Key: NUTCH-2171 > URL: https://issues.apache.org/jira/browse/NUTCH-2171 > Project: Nutch > Issue Type: Task >Reporter: Lewis John McGibbney > > Lambda expressions are fantastic. I tried to undertake a small exercise which > would indicate how many we could implement however this was a fruitless > effort. A patch is going to be a better approach. This task involves > upgrading various properties in default.properties as well as a systemic > source code analysis with the aim of implementing Java 8 goodies throughout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2346) Check Types at Object Equality
[ https://issues.apache.org/jira/browse/NUTCH-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831546#comment-15831546 ] Furkan KAMACI commented on NUTCH-2346: -- [~lewismc] We should check both either object is null or not and belongs to same class when checking equality of objects. GeneratorJob.java does not consider null case and class. I've added it. On the other hand, Metadata.java checks for null case but tries to check class equality with exception handling, which is not the proper way. > Check Types at Object Equality > -- > > Key: NUTCH-2346 > URL: https://issues.apache.org/jira/browse/NUTCH-2346 > Project: Nutch > Issue Type: Bug > Components: generator, metadata >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.4 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2352) Log with Generic Class Name at Nutch 1.x
Furkan KAMACI created NUTCH-2352: Summary: Log with Generic Class Name at Nutch 1.x Key: NUTCH-2352 URL: https://issues.apache.org/jira/browse/NUTCH-2352 Project: Nutch Issue Type: Improvement Affects Versions: 1.12 Reporter: Furkan KAMACI Assignee: Furkan KAMACI Priority: Minor Fix For: 1.13 There are many mistakes when some reference code is copied and created a new class and a logger is used. We can log with a generic class name to avoid it as like: {code:java} private static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); {code} (cf. SOLR-8324) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2351) Log with Generic Class Name at Nutch 2.x
[ https://issues.apache.org/jira/browse/NUTCH-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826734#comment-15826734 ] Furkan KAMACI commented on NUTCH-2351: -- [~wastl-nagel] if this is OK, I can send the PR for Nutch 1.x too. > Log with Generic Class Name at Nutch 2.x > > > Key: NUTCH-2351 > URL: https://issues.apache.org/jira/browse/NUTCH-2351 > Project: Nutch > Issue Type: Improvement >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.4 > > > There are many mistakes when some reference code is copied and created a new > class and a logger is used. We can log with a generic class name to avoid it > as like: > {code:java} > private static final Logger LOG = > LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2351) Log with Generic Class Name
[ https://issues.apache.org/jira/browse/NUTCH-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2351: - Description: There are many mistakes when some reference code is copied and created a new class and a logger is used. We can log with a generic class name to avoid it as like: {code:java} private static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); {code} > Log with Generic Class Name > --- > > Key: NUTCH-2351 > URL: https://issues.apache.org/jira/browse/NUTCH-2351 > Project: Nutch > Issue Type: Improvement >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.4 > > > There are many mistakes when some reference code is copied and created a new > class and a logger is used. We can log with a generic class name to avoid it > as like: > {code:java} > private static final Logger LOG = > LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2351) Log with Generic Class Name
Furkan KAMACI created NUTCH-2351: Summary: Log with Generic Class Name Key: NUTCH-2351 URL: https://issues.apache.org/jira/browse/NUTCH-2351 Project: Nutch Issue Type: Improvement Affects Versions: 2.3.1 Reporter: Furkan KAMACI Assignee: Furkan KAMACI Priority: Minor Fix For: 2.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name
[ https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826467#comment-15826467 ] Furkan KAMACI commented on NUTCH-2345: -- [~wastl-nagel] I'll provide the patch as soon as possible. > FetchItemQueue logs are logged with wrong class name > > > Key: NUTCH-2345 > URL: https://issues.apache.org/jira/browse/NUTCH-2345 > Project: Nutch > Issue Type: Bug > Components: fetcher >Affects Versions: 1.11, 1.12 > Environment: Any >Reporter: Monika Gupta >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 1.13 > > > I ran bin/nutch fetch and notice that the log statements of class > FetchItemQueue.java are logged in logs/hadoop.log with wrong file name as > FetchItemQueues.java > Refer the execution log: > 2017-01-06 15:31:25,562 INFO fetcher.FetchItemQueues - maxThreads= 1 > 2017-01-06 15:31:28,565 INFO fetcher.FetchItemQueues - inProgress= 0 > Issue is in the logger for class FetchItemQueue.java. > Currently it is- > private static final Logger LOG = > LoggerFactory.getLogger(FetchItemQueues.class); > Correction: It should be- > private static final Logger LOG = > LoggerFactory.getLogger(FetchItemQueue.class); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2350) Add Missing activeConfId Field to NutchStatus Object
[ https://issues.apache.org/jira/browse/NUTCH-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826452#comment-15826452 ] Furkan KAMACI commented on NUTCH-2350: -- This is unrelated to NUTCH-2344. Nutch Web GUI tries to convert NutchStatus object at api package to its NutchStatus object (which is at Web GUI package). This is the related code from NutchClientImpl: {code:java} @Override public NutchStatus getNutchStatus() { return nutchResource.path("/admin").type(APPLICATION_JSON) .get(NutchStatus.class); } {code} Return class from Nutch REST API and expected classes have same name but different. I've added necessary field to web gun package object. > Add Missing activeConfId Field to NutchStatus Object > > > Key: NUTCH-2350 > URL: https://issues.apache.org/jira/browse/NUTCH-2350 > Project: Nutch > Issue Type: Bug > Components: web gui >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name
[ https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824389#comment-15824389 ] Furkan KAMACI commented on NUTCH-2345: -- [~lewismc] Such mistakes are usual when some reference code is copied and created a new class. This is a generic code to get class name and which is used at Solr now: {code:java} private static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); {code} I can switch all the loggers to that convention? > FetchItemQueue logs are logged with wrong class name > > > Key: NUTCH-2345 > URL: https://issues.apache.org/jira/browse/NUTCH-2345 > Project: Nutch > Issue Type: Bug > Components: fetcher >Affects Versions: 1.11, 1.12 > Environment: Any >Reporter: Monika Gupta >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 1.13 > > > I ran bin/nutch fetch and notice that the log statements of class > FetchItemQueue.java are logged in logs/hadoop.log with wrong file name as > FetchItemQueues.java > Refer the execution log: > 2017-01-06 15:31:25,562 INFO fetcher.FetchItemQueues - maxThreads= 1 > 2017-01-06 15:31:28,565 INFO fetcher.FetchItemQueues - inProgress= 0 > Issue is in the logger for class FetchItemQueue.java. > Currently it is- > private static final Logger LOG = > LoggerFactory.getLogger(FetchItemQueues.class); > Correction: It should be- > private static final Logger LOG = > LoggerFactory.getLogger(FetchItemQueue.class); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2350) Add Missing activeConfId Field to NutchStatus Object
[ https://issues.apache.org/jira/browse/NUTCH-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2350: - Summary: Add Missing activeConfId Field to NutchStatus Object (was: Add Missing Fields to NutchStatus Object) > Add Missing activeConfId Field to NutchStatus Object > > > Key: NUTCH-2350 > URL: https://issues.apache.org/jira/browse/NUTCH-2350 > Project: Nutch > Issue Type: Bug > Components: web gui >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2350) Add Missing Fields to NutchStatus Object
[ https://issues.apache.org/jira/browse/NUTCH-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2350: - Fix Version/s: (was: 1.13) 2.4 > Add Missing Fields to NutchStatus Object > > > Key: NUTCH-2350 > URL: https://issues.apache.org/jira/browse/NUTCH-2350 > Project: Nutch > Issue Type: Bug > Components: web gui >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2350) Add Missing Fields to NutchStatus Object
Furkan KAMACI created NUTCH-2350: Summary: Add Missing Fields to NutchStatus Object Key: NUTCH-2350 URL: https://issues.apache.org/jira/browse/NUTCH-2350 Project: Nutch Issue Type: Bug Components: web gui Affects Versions: 1.12 Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 1.13 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2344) Authentication Support for Web GUI
[ https://issues.apache.org/jira/browse/NUTCH-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822860#comment-15822860 ] Furkan KAMACI commented on NUTCH-2344: -- [~lewismc] this seems to be related a mismatch between activeConfId at NutchStatus of REST API. I think that you could get that error without applying the patch too. Could check without applying to patch to understand whether you still get that error or not? I'll provide a fix for it. > Authentication Support for Web GUI > -- > > Key: NUTCH-2344 > URL: https://issues.apache.org/jira/browse/NUTCH-2344 > Project: Nutch > Issue Type: New Feature > Components: web gui >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > Attachments: Firefox_Screenshot_2017-01-13T19-10-49.499Z.png > > > We should implement an authentication support for Web GUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2199) Documentation for Nutch 2.X REST API
[ https://issues.apache.org/jira/browse/NUTCH-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812766#comment-15812766 ] Furkan KAMACI commented on NUTCH-2199: -- [~lewismc] Can we close this issue due to it is a duplicate of NUTCH-2243 > Documentation for Nutch 2.X REST API > > > Key: NUTCH-2199 > URL: https://issues.apache.org/jira/browse/NUTCH-2199 > Project: Nutch > Issue Type: New Feature > Components: documentation, REST_api >Affects Versions: 2.3.1 >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.5 > > > The work done on NUTCH-1800 needs to be ported to 2.X branch. This is > trivial, I thought I had already done it but obviously not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2348) Close GZIPInputStream
Furkan KAMACI created NUTCH-2348: Summary: Close GZIPInputStream Key: NUTCH-2348 URL: https://issues.apache.org/jira/browse/NUTCH-2348 Project: Nutch Issue Type: Bug Components: tool Affects Versions: 2.3.1 Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.4 GZIPInputStream is not closed and it should be finally closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2347) Use Logger Instead of Printing Throwable
[ https://issues.apache.org/jira/browse/NUTCH-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2347: - Priority: Minor (was: Major) > Use Logger Instead of Printing Throwable > > > Key: NUTCH-2347 > URL: https://issues.apache.org/jira/browse/NUTCH-2347 > Project: Nutch > Issue Type: Improvement >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.4 > > > Loggers should be used instead of printing Throwable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2347) Use Logger Instead of Printing Throwable
Furkan KAMACI created NUTCH-2347: Summary: Use Logger Instead of Printing Throwable Key: NUTCH-2347 URL: https://issues.apache.org/jira/browse/NUTCH-2347 Project: Nutch Issue Type: Improvement Affects Versions: 2.3.1 Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.4 Loggers should be used instead of printing Throwable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1915) Error in Nutch 2.X WebApp stalls progress bar
[ https://issues.apache.org/jira/browse/NUTCH-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812283#comment-15812283 ] Furkan KAMACI commented on NUTCH-1915: -- [~lewismc] Could you increase *DEFAULT_TIMEOUT_SEC* which has default value of *60* at _RemoteCommandExecutor.java_ and try it again? > Error in Nutch 2.X WebApp stalls progress bar > - > > Key: NUTCH-1915 > URL: https://issues.apache.org/jira/browse/NUTCH-1915 > Project: Nutch > Issue Type: Bug > Components: web gui >Affects Versions: 2.3 > Environment: Nutch 2.3-SNAPSHOT HEAD > HBase 0.94.14 > Gora 0.5 >Reporter: Lewis John McGibbney > Fix For: 2.5 > > > When I define a crawl within the Nutch 2.X webapp on the above stack I > sometimes get the following stack trace > {code} > 2015-01-12 14:48:25,943 INFO fetcher.FetcherJob - fetching > http://www.darpa.mil/Our_Work/I2O/Personnel/Mr__Steve_Jameson.aspx (queue > crawl delay=5000ms) > 2015-01-12 14:48:26,563 ERROR impl.RemoteCommandExecutor - Remote command > failed > java.util.concurrent.TimeoutException > at java.util.concurrent.FutureTask.get(FutureTask.java:201) > at > org.apache.nutch.webui.client.impl.RemoteCommandExecutor.executeRemoteJob(RemoteCommandExecutor.java:61) > at > org.apache.nutch.webui.client.impl.CrawlingCycle.executeCrawlCycle(CrawlingCycle.java:58) > at > org.apache.nutch.webui.service.impl.CrawlServiceImpl.startCrawl(CrawlServiceImpl.java:69) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) > at > org.springframework.aop.interceptor.AsyncExecutionInterceptor$1.call(AsyncExecutionInterceptor.java:97) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2015-01-12 14:48:26,563 INFO impl.CrawlingCycle - Executed remote command > data: FETCH status: FAILED > 2015-01-12 14:48:27,088 INFO fetcher.FetcherJob - 10/10 spinwaiting/active, > 71 pages, 1 errors, 1.4 1 pages/s, 275 146 kb/s, 193 URLs in 4 queues > {code} > Right now I don't know what this relates to but I know that it stalls the > task execution progress bar within the > [CrawlsPage|https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/webui/pages/crawls/CrawlsPage.html] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2205) Nutch solrdedup error in solrcloud for larger docs
[ https://issues.apache.org/jira/browse/NUTCH-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812265#comment-15812265 ] Furkan KAMACI commented on NUTCH-2205: -- [~VictorHu] Do you still get that error? Because logs says: bq.No live SolrServers available and it seems that your cluster was down as [~markus17] pointed. > Nutch solrdedup error in solrcloud for larger docs > --- > > Key: NUTCH-2205 > URL: https://issues.apache.org/jira/browse/NUTCH-2205 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 2.3 > Environment: CentOS 6.5,Jdk 1.7.0_75,omcat 8.0.9 ,Hadoop > 2.5.2,Zookeeper 3.4.6 ,Hbase 0.98.8 ,Solr 4.8.1 ,Nutch 2.3.1 >Reporter: VictorHu > Fix For: 2.5 > > > When the number of solr docs larger than 9000,the solrdedup of the nutch is > broken.This is log: > http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2 > 16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: > starting... > 16/01/25 17:02:38 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: Solr > url: http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2 > 16/01/25 17:02:39 INFO client.RMProxy: Connecting to ResourceManager at > master.Itble/10.192.1.100:8032 > 16/01/25 17:02:43 INFO mapreduce.JobSubmitter: number of splits:1 > 16/01/25 17:02:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1453104806095_0162 > 16/01/25 17:02:44 INFO impl.YarnClientImpl: Submitted application > application_1453104806095_0162 > 16/01/25 17:02:44 INFO mapreduce.Job: The url to track the job: > http://master.Itble:8088/proxy/application_1453104806095_0162/ > 16/01/25 17:02:44 INFO mapreduce.Job: Running job: job_1453104806095_0162 > 16/01/25 17:02:54 INFO mapreduce.Job: Job job_1453104806095_0162 running in > uber mode : false > 16/01/25 17:02:54 INFO mapreduce.Job: map 0% reduce 0% > 16/01/25 17:03:02 INFO mapreduce.Job: Task Id : > attempt_1453104806095_0162_m_00_0, Status : FAILED > Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > org.apache.solr.client.solrj.SolrServerException: No live SolrServers > available to handle this > request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2, > http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, > http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1] > at > org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) > at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) > at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) > at > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.createRecordReader(SolrDeleteDuplicates.java:291) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.(MapTask.java:492) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > 16/01/25 17:03:12 INFO mapreduce.Job: Task Id : > attempt_1453104806095_0162_m_00_1, Status : FAILED > Error: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > org.apache.solr.client.solrj.SolrServerException: No live SolrServers > available to handle this > request:[http://10.192.1.100:8080/solr/myEnterpriseCollection_shard2_replica2, > http://10.192.1.101:8080/solr/myEnterpriseCollection_shard1_replica2, > http://10.192.1.103:8080/solr/myEnterpriseCollection_shard2_replica1, > http://10.192.1.102:8080/solr/myEnterpriseCollection_shard1_replica1] > at > org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:554) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) > at > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) > at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) > at >
[jira] [Commented] (NUTCH-2257) apache-nutch-2.3.1-src.tar.gz can not be built
[ https://issues.apache.org/jira/browse/NUTCH-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812256#comment-15812256 ] Furkan KAMACI commented on NUTCH-2257: -- [~kamir1604] I don't get such an error when I build 2.3.1 tag. Does that problem still exist? > apache-nutch-2.3.1-src.tar.gz can not be built > -- > > Key: NUTCH-2257 > URL: https://issues.apache.org/jira/browse/NUTCH-2257 > Project: Nutch > Issue Type: Bug > Components: build >Affects Versions: 2.3.1 > Environment: jdk1.7.0_67 > nutch 2.3.1 > Hadoop 2.6.0 > HBase 1.0.0 >Reporter: Mirko Kaempf > > The build fails for: >apache-nutch-2.3.1-src.tar.gz > but after replacing src folder from > apache-nutch-2.3.1-src.zip > the build works fine. > Error messages: > compile: > [echo] Compiling plugin: indexer-solr > [javac] Compiling 1 source file to > /opt/examples/apache-nutch-2.3.1/build/indexer-solr/classes > [javac] > /opt/examples/apache-nutch-2.3.1/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrUtils.java:24: > error: cannot find symbol > [javac] if (job.getBoolean(SolrConstants.USE_AUTH, false)) { > [javac]^ > [javac] symbol: variable SolrConstants > [javac] location: class SolrUtils > [javac] > /opt/examples/apache-nutch-2.3.1/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrUtils.java:25: > error: cannot find symbol > [javac] String username = job.get(SolrConstants.USERNAME); > [javac] ^ > [javac] symbol: variable SolrConstants > [javac] location: class SolrUtils > [javac] > /opt/examples/apache-nutch-2.3.1/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrUtils.java:35: > error: cannot find symbol > [javac] .get(SolrConstants.PASSWORD))); > [javac]^ > [javac] symbol: variable SolrConstants > [javac] location: class SolrUtils > [javac] > /opt/examples/apache-nutch-2.3.1/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrUtils.java:43: > error: cannot find symbol > [javac] return new HttpSolrServer(job.get(SolrConstants.SERVER_URL), > client); > [javac] ^ > [javac] symbol: variable SolrConstants > [javac] location: class SolrUtils > [javac] 4 errors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2275) MD5Signature by default doesn't take in account parse
[ https://issues.apache.org/jira/browse/NUTCH-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812209#comment-15812209 ] Furkan KAMACI commented on NUTCH-2275: -- [~fre93] does that problem still exist? > MD5Signature by default doesn't take in account parse > - > > Key: NUTCH-2275 > URL: https://issues.apache.org/jira/browse/NUTCH-2275 > Project: Nutch > Issue Type: Bug > Components: parser >Affects Versions: 1.11 >Reporter: Francesco Capponi > > I'm testing Apache Nutch with the feed's plugin. I've noticed that for each > page it generates the same digest/signature, therefore the dedup cleans > everything up from the database. > I'm wondering why the class MD5Signature is the default one instead of > TextMD5Signature. > Anyhow now I've modified a little bit the MD5Signature to let it work with > the feed plugin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2313) Error in Nutch 2.X WebApp Inject
[ https://issues.apache.org/jira/browse/NUTCH-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812081#comment-15812081 ] Furkan KAMACI commented on NUTCH-2313: -- [~kherox] Could you increase *DEFAULT_TIMEOUT_SEC* which has default value of *60* at _RemoteCommandExecutor.java_ and try it again? > Error in Nutch 2.X WebApp Inject > - > > Key: NUTCH-2313 > URL: https://issues.apache.org/jira/browse/NUTCH-2313 > Project: Nutch > Issue Type: Bug > Components: web gui >Affects Versions: 2.3 > Environment: Nutch 2.3.1 > Hbase 2 > Hadoop 2.4 >Reporter: kakou Denis > Fix For: 2.5 > > > when i define a crawl within the web, I have this ouput. > 16/09/01 20:58:51 INFO resource.PropertiesFactory: Loading properties files > from > jar:file:/tmp/hadoop-unjar2700707934366020491/lib/wicket-extensions-6.13.0.jar!/org/apache/wicket/extensions/Initializer.properties > with loader > org.apache.wicket.resource.IsoPropertiesFilePropertiesLoader@37dc175b > 16/09/01 20:59:10 WARN RequestCycleExtra: > 16/09/01 20:59:10 WARN RequestCycleExtra: Handling the following exception > org.apache.wicket.core.request.mapper.StalePageException > 16/09/01 20:59:10 WARN RequestCycleExtra: > 16/09/01 20:59:10 WARN render.WebPageRenderer: The Buffered response should > be handled by BufferedResponseRequestHandler > 16/09/01 20:59:22 ERROR impl.RemoteCommandExecutor: Remote command failed > java.util.concurrent.TimeoutException > at java.util.concurrent.FutureTask.get(FutureTask.java:205) > at > org.apache.nutch.webui.client.impl.RemoteCommandExecutor.executeRemoteJob(RemoteCommandExecutor.java:61) > at > org.apache.nutch.webui.client.impl.CrawlingCycle.executeCrawlCycle(CrawlingCycle.java:58) > at > org.apache.nutch.webui.service.impl.CrawlServiceImpl.startCrawl(CrawlServiceImpl.java:69) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) > at > org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) > at > org.springframework.aop.interceptor.AsyncExecutionInterceptor$1.call(AsyncExecutionInterceptor.java:97) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/09/01 20:59:22 INFO impl.CrawlingCycle: Executed remote command data: > INJECT status: FAILED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2342) Inlinks are not being indexed as part of index-links plugin
[ https://issues.apache.org/jira/browse/NUTCH-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812058#comment-15812058 ] Furkan KAMACI commented on NUTCH-2342: -- [~19manish90] Do you have logs for your problem? > Inlinks are not being indexed as part of index-links plugin > --- > > Key: NUTCH-2342 > URL: https://issues.apache.org/jira/browse/NUTCH-2342 > Project: Nutch > Issue Type: Bug > Components: indexer, linkdb >Affects Versions: 1.12 > Environment: We are using linux machines for DEV and UAT. >Reporter: Manish Bassi > > I have used index-links plugin along with other plugins to index both the > inlinks and outlinks for a given page. But only the outlinks are getting > indexed and not the inlinks. > Due to this issue, even the anchor plugin is not working as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name
[ https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811983#comment-15811983 ] Furkan KAMACI commented on NUTCH-2345: -- Thanks for reporting it [~Mgupta]! I've created the PR. > FetchItemQueue logs are logged with wrong class name > > > Key: NUTCH-2345 > URL: https://issues.apache.org/jira/browse/NUTCH-2345 > Project: Nutch > Issue Type: Bug > Components: fetcher >Affects Versions: 1.11, 1.12 > Environment: Any >Reporter: Monika Gupta >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 1.13 > > > I ran bin/nutch fetch and notice that the log statements of class > FetchItemQueue.java are logged in logs/hadoop.log with wrong file name as > FetchItemQueues.java > Refer the execution log: > 2017-01-06 15:31:25,562 INFO fetcher.FetchItemQueues - maxThreads= 1 > 2017-01-06 15:31:28,565 INFO fetcher.FetchItemQueues - inProgress= 0 > Issue is in the logger for class FetchItemQueue.java. > Currently it is- > private static final Logger LOG = > LoggerFactory.getLogger(FetchItemQueues.class); > Correction: It should be- > private static final Logger LOG = > LoggerFactory.getLogger(FetchItemQueue.class); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name
[ https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI reassigned NUTCH-2345: Assignee: Furkan KAMACI > FetchItemQueue logs are logged with wrong class name > > > Key: NUTCH-2345 > URL: https://issues.apache.org/jira/browse/NUTCH-2345 > Project: Nutch > Issue Type: Bug > Components: fetcher >Affects Versions: 1.11, 1.12 > Environment: Any >Reporter: Monika Gupta >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 1.13 > > > I ran bin/nutch fetch and notice that the log statements of class > FetchItemQueue.java are logged in logs/hadoop.log with wrong file name as > FetchItemQueues.java > Refer the execution log: > 2017-01-06 15:31:25,562 INFO fetcher.FetchItemQueues - maxThreads= 1 > 2017-01-06 15:31:28,565 INFO fetcher.FetchItemQueues - inProgress= 0 > Issue is in the logger for class FetchItemQueue.java. > Currently it is- > private static final Logger LOG = > LoggerFactory.getLogger(FetchItemQueues.class); > Correction: It should be- > private static final Logger LOG = > LoggerFactory.getLogger(FetchItemQueue.class); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2346) Check Types at Object Equality
Furkan KAMACI created NUTCH-2346: Summary: Check Types at Object Equality Key: NUTCH-2346 URL: https://issues.apache.org/jira/browse/NUTCH-2346 Project: Nutch Issue Type: Bug Components: generator, metadata Affects Versions: 2.3.1 Reporter: Furkan KAMACI Assignee: Furkan KAMACI Priority: Minor Fix For: 2.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2344) Authentication Support for Web GUI
Furkan KAMACI created NUTCH-2344: Summary: Authentication Support for Web GUI Key: NUTCH-2344 URL: https://issues.apache.org/jira/browse/NUTCH-2344 Project: Nutch Issue Type: New Feature Components: web gui Affects Versions: 2.3.1 Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.4 We should implement an authentication support for Web GUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2268) SolrIndexerJob: java.lang.RuntimeException
[ https://issues.apache.org/jira/browse/NUTCH-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15809455#comment-15809455 ] Furkan KAMACI commented on NUTCH-2268: -- Which error do you get? > SolrIndexerJob: java.lang.RuntimeException > -- > > Key: NUTCH-2268 > URL: https://issues.apache.org/jira/browse/NUTCH-2268 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 2.3.1 > Environment: iam using > Hbase V:hbase-0.98.19-hadoop2 > Solr V : 6.0.0 > Nutch : 2.3.1 > java : 8 >Reporter: narendra > Labels: indexing > Original Estimate: 12h > Remaining Estimate: 12h > > Could you please help out of this error > SolrIndexerJob: java.lang.RuntimeException: job > failed:name=apache-nutch-2.3.1.jar > when i run this commend > local/bin/nutch solrindex http://localhost:8983/solr/ -all > Tried with Solr 4.10.3 but same error iam getting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2226) SOLR mismatch in deploy mode
[ https://issues.apache.org/jira/browse/NUTCH-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773457#comment-15773457 ] Furkan KAMACI commented on NUTCH-2226: -- [~markus17] This issue will be fixed by NUTCH-2267 and can be closed because of being duplicate. You can check this conversation: https://mail-archives.apache.org/mod_mbox/nutch-user/201612.mbox/%3c738483596.8830478.1482188268...@mail.yahoo.com%3e > SOLR mismatch in deploy mode > > > Key: NUTCH-2226 > URL: https://issues.apache.org/jira/browse/NUTCH-2226 > Project: Nutch > Issue Type: Bug > Components: indexer >Reporter: Steven W > Labels: solr > > I receive this error when indexing to SolrCloud in deploy mode on Hadoop > 2.7.0: > Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, > stack[0]) is not assignable to > 'org/apache/http/impl/client/CloseableHttpClient' > I'm assuming there's a version mismatch somewhere in the deploy JAR, but I > don't know where to look. This is related to NUTCH-2197. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1220) Upgrade Solr deps
[ https://issues.apache.org/jira/browse/NUTCH-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773453#comment-15773453 ] Furkan KAMACI commented on NUTCH-1220: -- [~lewismc] this issue seems to be old and can be closed. > Upgrade Solr deps > - > > Key: NUTCH-1220 > URL: https://issues.apache.org/jira/browse/NUTCH-1220 > Project: Nutch > Issue Type: Task > Components: build, indexer >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Attachments: NUTCH-1633-trunk.patch > > > SlfJ4 needs to be part of upgrade to Solr 3.5 but that breaks something else. > Likely Hadoop has a different Slf4J version? > {code} > Exception in thread "main" java.lang.NoSuchMethodError: > org.slf4j.spi.LocationAwareLogger.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V > at > org.apache.commons.logging.impl.SLF4JLocationAwareLog.debug(SLF4JLocationAwareLog.java:133) > at > org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:136) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:180) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159) > at > org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:409) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:395) > at > org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1418) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1319) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:109) > at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:544) > at > org.apache.hadoop.mapred.FileInputFormat.addInputPath(FileInputFormat.java:339) > at > org.apache.nutch.util.domain.DomainStatistics.run(DomainStatistics.java:108) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.nutch.util.domain.DomainStatistics.main(DomainStatistics.java:215) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2268) SolrIndexerJob: java.lang.RuntimeException
[ https://issues.apache.org/jira/browse/NUTCH-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15773441#comment-15773441 ] Furkan KAMACI commented on NUTCH-2268: -- [~lewismc] we can close this issue. > SolrIndexerJob: java.lang.RuntimeException > -- > > Key: NUTCH-2268 > URL: https://issues.apache.org/jira/browse/NUTCH-2268 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 2.3.1 > Environment: iam using > Hbase V:hbase-0.98.19-hadoop2 > Solr V : 6.0.0 > Nutch : 2.3.1 > java : 8 >Reporter: narendra > Labels: indexing > Original Estimate: 12h > Remaining Estimate: 12h > > Could you please help out of this error > SolrIndexerJob: java.lang.RuntimeException: job > failed:name=apache-nutch-2.3.1.jar > when i run this commend > local/bin/nutch solrindex http://localhost:8983/solr/ -all > Tried with Solr 4.10.3 but same error iam getting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2089) Move Nutch 2.x to compile on JDK 8
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15461848#comment-15461848 ] Furkan KAMACI commented on NUTCH-2089: -- [~lewismc] I've created a PR which includes fixes of all errors, some warnings and javadoc improvements. Javadoc can be generated at Nutch 2.x without any errors anymore. > Move Nutch 2.x to compile on JDK 8 > -- > > Key: NUTCH-2089 > URL: https://issues.apache.org/jira/browse/NUTCH-2089 > Project: Nutch > Issue Type: Bug > Components: build >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI > Fix For: 2.4 > > Attachments: java8output.txt, java8output.txt > > > Public support updates for JDK 1.7 stopped in April of this year. > https://www.java.com/en/download/faq/java_7.xml > In our next release we should shift support to JDK 1.8. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2314) Use indexer-elastic2 Plugin for javadoc and eclipse Targets
Furkan KAMACI created NUTCH-2314: Summary: Use indexer-elastic2 Plugin for javadoc and eclipse Targets Key: NUTCH-2314 URL: https://issues.apache.org/jira/browse/NUTCH-2314 Project: Nutch Issue Type: Bug Components: plugin Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.4 indexer-elastic2 plugin is used at deploy and clean tasks of plugin/build.xml However, indexer-elastic plugin is used instead of indexer-elastic2 for javadoc and eclipse tasks at build.xml and gives error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2308) Implement SSL Connection Test at TestNutchAPI
[ https://issues.apache.org/jira/browse/NUTCH-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449410#comment-15449410 ] Furkan KAMACI commented on NUTCH-2308: -- [~lewismc] I've updated the PR. > Implement SSL Connection Test at TestNutchAPI > - > > Key: NUTCH-2308 > URL: https://issues.apache.org/jira/browse/NUTCH-2308 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > Currently, testing of SSL is ignored at TestNutchAPI. We should complete the > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2264) Check Forbidden APIs at Build
[ https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446237#comment-15446237 ] Furkan KAMACI commented on NUTCH-2264: -- [~lewismc] I've created a precommit task which depends on runtime and test tasks and responsible for checking forbidden apis. I can bind it to runtime or test (or both) if you want. I've also fixed all the errors reported by forbiddenapis. > Check Forbidden APIs at Build > - > > Key: NUTCH-2264 > URL: https://issues.apache.org/jira/browse/NUTCH-2264 > Project: Nutch > Issue Type: Task >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > > We should avoid [forbidden > calls|https://github.com/policeman-tools/forbidden-apis/wiki] and check in > the ant build for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2264) Check Forbidden APIs at Build
[ https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2264: - Priority: Major (was: Minor) > Check Forbidden APIs at Build > - > > Key: NUTCH-2264 > URL: https://issues.apache.org/jira/browse/NUTCH-2264 > Project: Nutch > Issue Type: Task >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > > We should avoid [forbidden > calls|https://github.com/policeman-tools/forbidden-apis/wiki] and check in > the ant build for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2264) Check Forbidden APIs at Build
[ https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2264: - Summary: Check Forbidden APIs at Build (was: Check Forbidden API's at Build) > Check Forbidden APIs at Build > - > > Key: NUTCH-2264 > URL: https://issues.apache.org/jira/browse/NUTCH-2264 > Project: Nutch > Issue Type: Task >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > > We should avoid [forbidden > calls|https://github.com/policeman-tools/forbidden-apis/wiki] and check in > the ant build for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-2307) Implement Missing NutchServer REST API Tests
[ https://issues.apache.org/jira/browse/NUTCH-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2307 started by Furkan KAMACI. > Implement Missing NutchServer REST API Tests > > > Key: NUTCH-2307 > URL: https://issues.apache.org/jira/browse/NUTCH-2307 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > TestAPI.java was all commented. Reason was indicated as: > {quote} > CURRENTLY DISABLED. TESTS ARE FLAPPING FOR NO APPARENT REASON. > SHALL BE FIXED OR REPLACES BY NEW API IMPLEMENTATION > {quote} > So, we should implement that missing tests based on new > AbstractNutchAPITestBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-2264) Check Forbidden API's at Build
[ https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2264 started by Furkan KAMACI. > Check Forbidden API's at Build > -- > > Key: NUTCH-2264 > URL: https://issues.apache.org/jira/browse/NUTCH-2264 > Project: Nutch > Issue Type: Task >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > > We should avoid [forbidden > calls|https://github.com/policeman-tools/forbidden-apis/wiki] and check in > the ant build for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (NUTCH-2122) Implement Javadoc package-info.java for webui packages
[ https://issues.apache.org/jira/browse/NUTCH-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI reassigned NUTCH-2122: Assignee: Furkan KAMACI > Implement Javadoc package-info.java for webui packages > -- > > Key: NUTCH-2122 > URL: https://issues.apache.org/jira/browse/NUTCH-2122 > Project: Nutch > Issue Type: Improvement > Components: nutch server >Affects Versions: 1.10 >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI >Priority: Trivial > Fix For: 1.13 > > > [~sujenshah] I noticed that the Javadoc does not contain package.html > displaying package level introductory Javadoc as every other package does. > http://nutch.apache.org/apidocs/apidocs-1.10/index.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2308) Implement SSL Connection Test at TestNutchAPI
[ https://issues.apache.org/jira/browse/NUTCH-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440208#comment-15440208 ] Furkan KAMACI commented on NUTCH-2308: -- [~lewismc] Could you check it again? > Implement SSL Connection Test at TestNutchAPI > - > > Key: NUTCH-2308 > URL: https://issues.apache.org/jira/browse/NUTCH-2308 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > Currently, testing of SSL is ignored at TestNutchAPI. We should complete the > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2308) Implement SSL Connection Test at TestNutchAPI
[ https://issues.apache.org/jira/browse/NUTCH-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439409#comment-15439409 ] Furkan KAMACI commented on NUTCH-2308: -- [~lewismc] could you check my PR? > Implement SSL Connection Test at TestNutchAPI > - > > Key: NUTCH-2308 > URL: https://issues.apache.org/jira/browse/NUTCH-2308 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > Currently, testing of SSL is ignored at TestNutchAPI. We should complete the > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2308) Implement SSL Connection Test at TestNutchAPI
Furkan KAMACI created NUTCH-2308: Summary: Implement SSL Connection Test at TestNutchAPI Key: NUTCH-2308 URL: https://issues.apache.org/jira/browse/NUTCH-2308 Project: Nutch Issue Type: Improvement Components: REST_api, web gui Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.4 Currently, testing of SSL is ignored at TestNutchAPI. We should complete the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2307) Implement Missing NutchServer REST API Tests
Furkan KAMACI created NUTCH-2307: Summary: Implement Missing NutchServer REST API Tests Key: NUTCH-2307 URL: https://issues.apache.org/jira/browse/NUTCH-2307 Project: Nutch Issue Type: Improvement Components: REST_api, web gui Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.4 TestAPI.java was all commented. Reason was indicated as: {quote} CURRENTLY DISABLED. TESTS ARE FLAPPING FOR NO APPARENT REASON. SHALL BE FIXED OR REPLACES BY NEW API IMPLEMENTATION {quote} So, we should implement that missing tests based on new AbstractNutchAPITestBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (NUTCH-2301) Create Tests for Security Layer of NutchServer
[ https://issues.apache.org/jira/browse/NUTCH-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI resolved NUTCH-2301. -- Resolution: Fixed > Create Tests for Security Layer of NutchServer > -- > > Key: NUTCH-2301 > URL: https://issues.apache.org/jira/browse/NUTCH-2301 > Project: Nutch > Issue Type: Sub-task > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > Create tests for security layer of NutchServer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2306) Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API
[ https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432543#comment-15432543 ] Furkan KAMACI commented on NUTCH-2306: -- [~lewismc] I've created the PR. Could you apply this after https://github.com/apache/nutch/pull/144 > Id of Active Configuration Could Be Stored at NutchStatus and Exposed via > REST API > -- > > Key: NUTCH-2306 > URL: https://issues.apache.org/jira/browse/NUTCH-2306 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > NutchStatus holds information about configuration it uses. However, it should > also store the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are > merged, we will be able to store acitive configuration id and expose this > information via REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use
[ https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432541#comment-15432541 ] Furkan KAMACI commented on NUTCH-2303: -- [~lewismc] I've created the PR. > NutchServer Could Be Able To Select a Configuration to Use > -- > > Key: NUTCH-2303 > URL: https://issues.apache.org/jira/browse/NUTCH-2303 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > RAMConfManager is intented to hold different configurations. However, > currently NutchServer uses default config and it could be let to set an > active configuration id when startup a NutchServer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-2301) Create Tests for Security Layer of NutchServer
[ https://issues.apache.org/jira/browse/NUTCH-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2301 started by Furkan KAMACI. > Create Tests for Security Layer of NutchServer > -- > > Key: NUTCH-2301 > URL: https://issues.apache.org/jira/browse/NUTCH-2301 > Project: Nutch > Issue Type: Sub-task > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > Create tests for security layer of NutchServer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-2302) RAMConfManager Could Be Constructed With Custom Configuration
[ https://issues.apache.org/jira/browse/NUTCH-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2302 started by Furkan KAMACI. > RAMConfManager Could Be Constructed With Custom Configuration > -- > > Key: NUTCH-2302 > URL: https://issues.apache.org/jira/browse/NUTCH-2302 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > RAMConfManager is intented to hold different configurations which can be > accessible via a configuration id. However, it forces you to use a default > configuration with a default id when you construct it. When RAMConfManager is > used by any other classes they cannot set a custom configuration and it leads > problem. i.e. test resources cannot be used when you test NutchServer due to > it uses default configuration which is forced by RAMConfManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2306) Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API
[ https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2306: - Description: NutchStatus holds information about configuration it uses. However, it should also store the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are merged, we will be able to store acitive configuration id and expose this information via REST API. (was: NutchStatus holds information about configuration it uses. However, it should also expose the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are merged, we will be able to store used configuration id and expose this information via REST API.) > Id of Active Configuration Could Be Stored at NutchStatus and Exposed via > REST API > -- > > Key: NUTCH-2306 > URL: https://issues.apache.org/jira/browse/NUTCH-2306 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > NutchStatus holds information about configuration it uses. However, it should > also store the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are > merged, we will be able to store acitive configuration id and expose this > information via REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2306) Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API
[ https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2306: - Summary: Id of Active Configuration Could Be Stored at NutchStatus and Exposed via REST API (was: Id of Active Configuration at NutchStatus Could Be Stored and Exposed) > Id of Active Configuration Could Be Stored at NutchStatus and Exposed via > REST API > -- > > Key: NUTCH-2306 > URL: https://issues.apache.org/jira/browse/NUTCH-2306 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > NutchStatus holds information about configuration it uses. However, it should > also expose the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are > merged, we will be able to store used configuration id and expose this > information via REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2306) Id of Active Configuration at NutchStatus Could Be Stored and Exposed
[ https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2306: - Summary: Id of Active Configuration at NutchStatus Could Be Stored and Exposed (was: Id of Active Configuration at NutcStatus Could Be Stored and Exposed) > Id of Active Configuration at NutchStatus Could Be Stored and Exposed > - > > Key: NUTCH-2306 > URL: https://issues.apache.org/jira/browse/NUTCH-2306 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > NutchStatus holds information about configuration it uses. However, it should > also expose the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are > merged, we will be able to store used configuration id and expose this > information via REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2306) Id of Active Configuration at NutcStatus Could Be Stored and Exposed
[ https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15430254#comment-15430254 ] Furkan KAMACI commented on NUTCH-2306: -- [~lewismc] I've finished the mentioned implementation. Once NUTCH-2302 and NUTCH-2303 are merged I can create the PR. > Id of Active Configuration at NutcStatus Could Be Stored and Exposed > > > Key: NUTCH-2306 > URL: https://issues.apache.org/jira/browse/NUTCH-2306 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > NutchStatus holds information about configuration it uses. However, it should > also expose the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are > merged, we will be able to store used configuration id and expose this > information via REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2306) Id of Active Configuration at NutcStatus Could Be Stored and Exposed
[ https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2306: - Description: NutchStatus holds information about configuration it uses. However, it should also expose the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are merged, we will be able to store used configuration id and expose this information via REST API. (was: NutchStatus holds information about configuration it uses. However, it should also expose the id of that configuration. Once NUTCH-2302 is merged, we will be able to store used configuration id and expose this information via REST API.) > Id of Active Configuration at NutcStatus Could Be Stored and Exposed > > > Key: NUTCH-2306 > URL: https://issues.apache.org/jira/browse/NUTCH-2306 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > NutchStatus holds information about configuration it uses. However, it should > also expose the id of that configuration. Once NUTCH-2302 and NUTCH-2303 are > merged, we will be able to store used configuration id and expose this > information via REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2306) Id of Active Configuration at NutcStatus Could Be Stored and Exposed
[ https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2306: - Summary: Id of Active Configuration at NutcStatus Could Be Stored and Exposed (was: Id of Used Configuration at NutcStatus Could Be Stored and Exposed) > Id of Active Configuration at NutcStatus Could Be Stored and Exposed > > > Key: NUTCH-2306 > URL: https://issues.apache.org/jira/browse/NUTCH-2306 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > NutchStatus holds information about configuration it uses. However, it should > also expose the id of that configuration. Once NUTCH-2302 is merged, we will > be able to store used configuration id and expose this information via REST > API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2306) Id of Used Configuration at NutcStatus Could Be Stored and Exposed
Furkan KAMACI created NUTCH-2306: Summary: Id of Used Configuration at NutcStatus Could Be Stored and Exposed Key: NUTCH-2306 URL: https://issues.apache.org/jira/browse/NUTCH-2306 Project: Nutch Issue Type: Improvement Components: REST_api, web gui Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.4 NutchStatus holds information about configuration it uses. However, it should also expose the id of that configuration. Once NUTCH-2302 is merged, we will be able to store used configuration id and expose this information via REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-2306) Id of Used Configuration at NutcStatus Could Be Stored and Exposed
[ https://issues.apache.org/jira/browse/NUTCH-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2306 started by Furkan KAMACI. > Id of Used Configuration at NutcStatus Could Be Stored and Exposed > -- > > Key: NUTCH-2306 > URL: https://issues.apache.org/jira/browse/NUTCH-2306 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > NutchStatus holds information about configuration it uses. However, it should > also expose the id of that configuration. Once NUTCH-2302 is merged, we will > be able to store used configuration id and expose this information via REST > API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use
[ https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2303: - Comment: was deleted (was: [~lewismc] I improved Javadoc for all methods of committed class and combined that commits into one. Could you check the PR again?) > NutchServer Could Be Able To Select a Configuration to Use > -- > > Key: NUTCH-2303 > URL: https://issues.apache.org/jira/browse/NUTCH-2303 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > RAMConfManager is intented to hold different configurations. However, > currently NutchServer uses default config and it could be let to set an > active configuration id when startup a NutchServer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use
[ https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429522#comment-15429522 ] Furkan KAMACI commented on NUTCH-2303: -- [~lewismc] I improved Javadoc for all methods of committed class and combined that commits into one. Could you check the PR again? > NutchServer Could Be Able To Select a Configuration to Use > -- > > Key: NUTCH-2303 > URL: https://issues.apache.org/jira/browse/NUTCH-2303 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > RAMConfManager is intented to hold different configurations. However, > currently NutchServer uses default config and it could be let to set an > active configuration id when startup a NutchServer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use
[ https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429402#comment-15429402 ] Furkan KAMACI commented on NUTCH-2303: -- [~lewismc] I've finished the implementation but it requires NUTCH-2302 to be applied. > NutchServer Could Be Able To Select a Configuration to Use > -- > > Key: NUTCH-2303 > URL: https://issues.apache.org/jira/browse/NUTCH-2303 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > RAMConfManager is intented to hold different configurations. However, > currently NutchServer uses default config and it could be let to set an > active configuration id when startup a NutchServer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use
Furkan KAMACI created NUTCH-2303: Summary: NutchServer Could Be Able To Select a Configuration to Use Key: NUTCH-2303 URL: https://issues.apache.org/jira/browse/NUTCH-2303 Project: Nutch Issue Type: Improvement Components: REST_api, web gui Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.4 RAMConfManager is intented to hold different configurations. However, currently NutchServer uses default config and it could be let to set an active configuration id when startup a NutchServer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (NUTCH-2303) NutchServer Could Be Able To Select a Configuration to Use
[ https://issues.apache.org/jira/browse/NUTCH-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2303 started by Furkan KAMACI. > NutchServer Could Be Able To Select a Configuration to Use > -- > > Key: NUTCH-2303 > URL: https://issues.apache.org/jira/browse/NUTCH-2303 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > RAMConfManager is intented to hold different configurations. However, > currently NutchServer uses default config and it could be let to set an > active configuration id when startup a NutchServer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2302) RAMConfManager Could Be Constructed With Custom Configuration
[ https://issues.apache.org/jira/browse/NUTCH-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2302: - Summary: RAMConfManager Could Be Constructed With Custom Configuration (was: RAMConfManager Should Be Constructed With Custom Configuration ) > RAMConfManager Could Be Constructed With Custom Configuration > -- > > Key: NUTCH-2302 > URL: https://issues.apache.org/jira/browse/NUTCH-2302 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > RAMConfManager is intented to hold different configurations which can be > accessible via a configuration id. However, it forces you to use a default > configuration with a default id when you construct it. When RAMConfManager is > used by any other classes they cannot set a custom configuration and it leads > problem. i.e. test resources cannot be used when you test NutchServer due to > it uses default configuration which is forced by RAMConfManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2302) RAMConfManager Should Be Constructed With Custom Configuration
Furkan KAMACI created NUTCH-2302: Summary: RAMConfManager Should Be Constructed With Custom Configuration Key: NUTCH-2302 URL: https://issues.apache.org/jira/browse/NUTCH-2302 Project: Nutch Issue Type: Improvement Components: REST_api, web gui Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.4 RAMConfManager is intented to hold different configurations which can be accessible via a configuration id. However, it forces you to use a default configuration with a default id when you construct it. When RAMConfManager is used by any other classes they cannot set a custom configuration and it leads problem. i.e. test resources cannot be used when you test NutchServer due to it uses default configuration which is forced by RAMConfManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2301) Create Tests for Security Layer of NutchServer
Furkan KAMACI created NUTCH-2301: Summary: Create Tests for Security Layer of NutchServer Key: NUTCH-2301 URL: https://issues.apache.org/jira/browse/NUTCH-2301 Project: Nutch Issue Type: Sub-task Components: REST_api, web gui Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.4 Create tests for security layer of NutchServer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1756) Security layer for NutchServer
[ https://issues.apache.org/jira/browse/NUTCH-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428900#comment-15428900 ] Furkan KAMACI commented on NUTCH-1756: -- [~lewismc] I've fixed all your comments at https://github.com/apache/nutch/pull/142 could you check them? > Security layer for NutchServer > -- > > Key: NUTCH-1756 > URL: https://issues.apache.org/jira/browse/NUTCH-1756 > Project: Nutch > Issue Type: Improvement > Components: REST_api, web gui >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI >Priority: Critical > Labels: gsoc2016 > Fix For: 2.5 > > > It will be beneficial to have a security layer for NutchServer once we make > improvements upon it. I hope that GSoC goes ahead this year so we can tackle > such issues. > This issue should implement a standard security layer for REST API calls. It > should also add/expose this functionality through the WebApp. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2294) Authorization Support for REST API
Furkan KAMACI created NUTCH-2294: Summary: Authorization Support for REST API Key: NUTCH-2294 URL: https://issues.apache.org/jira/browse/NUTCH-2294 Project: Nutch Issue Type: Sub-task Components: REST_api, web gui Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.4 Add authorization for Nutch REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2289) SSL Support for REST API
[ https://issues.apache.org/jira/browse/NUTCH-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2289: - Summary: SSL Support for REST API (was: SSL Authentication Support for REST API) > SSL Support for REST API > > > Key: NUTCH-2289 > URL: https://issues.apache.org/jira/browse/NUTCH-2289 > Project: Nutch > Issue Type: Sub-task > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.5 > > > Add SSL Authentication for Nutch REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2289) SSL Authentication Support for REST API
Furkan KAMACI created NUTCH-2289: Summary: SSL Authentication Support for REST API Key: NUTCH-2289 URL: https://issues.apache.org/jira/browse/NUTCH-2289 Project: Nutch Issue Type: Sub-task Components: REST_api, web gui Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.5 Add SSL Authentication for Nutch REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2288) Upgrade Restlet to 2.3.7
Furkan KAMACI created NUTCH-2288: Summary: Upgrade Restlet to 2.3.7 Key: NUTCH-2288 URL: https://issues.apache.org/jira/browse/NUTCH-2288 Project: Nutch Issue Type: Improvement Components: REST_api, web gui Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.5 Currently we use restlet 2.2.3. We should upgrade restlet to 2.3.7. Changes can be seen at here: https://restlet.com/technical-resources/restlet-framework/misc/2.3/changes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2022) Investigate better documentation for the Nutch REST API's
[ https://issues.apache.org/jira/browse/NUTCH-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350145#comment-15350145 ] Furkan KAMACI commented on NUTCH-2022: -- [~lewismc] we have NUTCH-1800 and NUTCH-2243. What differs NUTCH-2022 from them? If there is nothing different, seems that it can be closed as duplicate. > Investigate better documentation for the Nutch REST API's > - > > Key: NUTCH-2022 > URL: https://issues.apache.org/jira/browse/NUTCH-2022 > Project: Nutch > Issue Type: Wish > Components: REST_api >Affects Versions: 2.3, 1.10 >Reporter: Lewis John McGibbney > > Over on Apache Tika we use [Miredot|http://www.miredot.com/] for better > representation of the Tika REST API. > Based on recent development on both 1.X and 2.x REST API's, it would be nice > to have a better interface for people to see. > An example of Miredot REST API docs can be seen on [Tika REST API > docs|http://tika.apache.org/1.8/miredot/index.html] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (NUTCH-2022) Investigate better documentation for the Nutch REST API's
[ https://issues.apache.org/jira/browse/NUTCH-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI reassigned NUTCH-2022: Assignee: Furkan KAMACI > Investigate better documentation for the Nutch REST API's > - > > Key: NUTCH-2022 > URL: https://issues.apache.org/jira/browse/NUTCH-2022 > Project: Nutch > Issue Type: Wish > Components: REST_api >Affects Versions: 2.3, 1.10 >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI > > Over on Apache Tika we use [Miredot|http://www.miredot.com/] for better > representation of the Tika REST API. > Based on recent development on both 1.X and 2.x REST API's, it would be nice > to have a better interface for people to see. > An example of Miredot REST API docs can be seen on [Tika REST API > docs|http://tika.apache.org/1.8/miredot/index.html] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2199) Documentation for Nutch 2.X REST API
[ https://issues.apache.org/jira/browse/NUTCH-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342644#comment-15342644 ] Furkan KAMACI commented on NUTCH-2199: -- This issue can be closed due to it is duplicated of NUTCH-2243 > Documentation for Nutch 2.X REST API > > > Key: NUTCH-2199 > URL: https://issues.apache.org/jira/browse/NUTCH-2199 > Project: Nutch > Issue Type: New Feature > Components: documentation, REST_api >Affects Versions: 2.3.1 >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.5 > > > The work done on NUTCH-1800 needs to be ported to 2.X branch. This is > trivial, I thought I had already done it but obviously not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2285) Digest Authentication Support for REST API
Furkan KAMACI created NUTCH-2285: Summary: Digest Authentication Support for REST API Key: NUTCH-2285 URL: https://issues.apache.org/jira/browse/NUTCH-2285 Project: Nutch Issue Type: Sub-task Components: REST_api, web gui Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2284) Basic Authentication Support for REST API
Furkan KAMACI created NUTCH-2284: Summary: Basic Authentication Support for REST API Key: NUTCH-2284 URL: https://issues.apache.org/jira/browse/NUTCH-2284 Project: Nutch Issue Type: New Feature Components: REST_api, web gui Reporter: Furkan KAMACI Assignee: Furkan KAMACI Fix For: 2.5 Add Basic Authentication for Nutch REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2284) Basic Authentication Support for REST API
[ https://issues.apache.org/jira/browse/NUTCH-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2284: - Issue Type: Sub-task (was: New Feature) Parent: NUTCH-1756 > Basic Authentication Support for REST API > - > > Key: NUTCH-2284 > URL: https://issues.apache.org/jira/browse/NUTCH-2284 > Project: Nutch > Issue Type: Sub-task > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.5 > > > Add Basic Authentication for Nutch REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-1800) Documentation for Nutch 1.X REST API
[ https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338608#comment-15338608 ] Furkan KAMACI commented on NUTCH-1800: -- [~lewismc] when I switch to master branch and try to generate the REST API I get error: {code} furkan@kamaci:~/projects/gsoc2016/nutch$ ant -lib ivy restdocs Buildfile: /home/furkan/projects/gsoc2016/nutch/build.xml Trying to override old definition of task javac [taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found. restdocs: [ivy:makepom] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ :: [ivy:makepom] :: loading settings :: url = jar:file:/home/furkan/projects/gsoc2016/nutch/ivy/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml [artifact:mvn] + Error stacktraces are turned on. [artifact:mvn] [INFO] Scanning for projects... [artifact:mvn] [INFO] [artifact:mvn] [INFO] Building Apache Nutch [artifact:mvn] [INFO]task-segment: [test] [artifact:mvn] [INFO] [artifact:mvn] [INFO] [resources:resources] [artifact:mvn] [WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent! [artifact:mvn] [INFO] skip non existing resourceDirectory /home/furkan/projects/gsoc2016/nutch/src/main/resources [artifact:mvn] Downloading: http://repo1.maven.org/maven2/org/apache/mrunit/mrunit/1.1.0/mrunit-1.1.0.jar [artifact:mvn] [INFO] [artifact:mvn] [ERROR] BUILD ERROR [artifact:mvn] [INFO] [artifact:mvn] [INFO] Failed to resolve artifact. [artifact:mvn] [artifact:mvn] Missing: [artifact:mvn] -- [artifact:mvn] 1) org.apache.mrunit:mrunit:jar:1.1.0 [artifact:mvn] [artifact:mvn] Try downloading the file manually from the project website. [artifact:mvn] [artifact:mvn] Then, install it using the command: [artifact:mvn] mvn install:install-file -DgroupId=org.apache.mrunit -DartifactId=mrunit -Dversion=1.1.0 -Dpackaging=jar -Dfile=/path/to/file [artifact:mvn] [artifact:mvn] Alternatively, if you host your own repository you can deploy the file there: [artifact:mvn] mvn deploy:deploy-file -DgroupId=org.apache.mrunit -DartifactId=mrunit -Dversion=1.1.0 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id] [artifact:mvn] [artifact:mvn] Path to dependency: [artifact:mvn] 1) org.apache.nutch:nutch:jar:1.12-SNAPSHOT [artifact:mvn] 2) org.apache.mrunit:mrunit:jar:1.1.0 [artifact:mvn] [artifact:mvn] -- [artifact:mvn] 1 required artifact is missing. [artifact:mvn] [artifact:mvn] for artifact: [artifact:mvn] org.apache.nutch:nutch:jar:1.12-SNAPSHOT [artifact:mvn] [artifact:mvn] from the specified remote repositories: [artifact:mvn] central (http://repo1.maven.org/maven2) [artifact:mvn] [artifact:mvn] [artifact:mvn] [artifact:mvn] [INFO] [artifact:mvn] [INFO] Trace [artifact:mvn] org.apache.maven.lifecycle.LifecycleExecutionException: Missing: [artifact:mvn] -- [artifact:mvn] 1) org.apache.mrunit:mrunit:jar:1.1.0 [artifact:mvn] [artifact:mvn] Try downloading the file manually from the project website. [artifact:mvn] [artifact:mvn] Then, install it using the command: [artifact:mvn] mvn install:install-file -DgroupId=org.apache.mrunit -DartifactId=mrunit -Dversion=1.1.0 -Dpackaging=jar -Dfile=/path/to/file [artifact:mvn] [artifact:mvn] Alternatively, if you host your own repository you can deploy the file there: [artifact:mvn] mvn deploy:deploy-file -DgroupId=org.apache.mrunit -DartifactId=mrunit -Dversion=1.1.0 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id] [artifact:mvn] [artifact:mvn] Path to dependency: [artifact:mvn] 1) org.apache.nutch:nutch:jar:1.12-SNAPSHOT [artifact:mvn] 2) org.apache.mrunit:mrunit:jar:1.1.0 [artifact:mvn] [artifact:mvn] -- [artifact:mvn] 1 required artifact is missing. [artifact:mvn] [artifact:mvn] for artifact: [artifact:mvn] org.apache.nutch:nutch:jar:1.12-SNAPSHOT [artifact:mvn] [artifact:mvn] from the specified remote repositories: [artifact:mvn] central (http://repo1.maven.org/maven2) [artifact:mvn] [artifact:mvn] [artifact:mvn] at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:576) [artifact:mvn] at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:500) [artifact:mvn] at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:479) [artifact:mvn] at
[jira] [Commented] (NUTCH-2267) Solr indexer fails at the end of the job with a java error message
[ https://issues.apache.org/jira/browse/NUTCH-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310866#comment-15310866 ] Furkan KAMACI commented on NUTCH-2267: -- We support Solr 5.4.1 (https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/ivy.xm) So, this is the reason of the error described at here. [~lewismc] issue should be closed for that reason. > Solr indexer fails at the end of the job with a java error message > -- > > Key: NUTCH-2267 > URL: https://issues.apache.org/jira/browse/NUTCH-2267 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: hadoop v2.7.2 solr6 in cloud configuration with > zookeeper 3.4.6. I use the master branch from github currently on commit > da252eb7b3d2d7b70 ( NUTCH - 2263 mingram and maxgram support for Unigram > Cosine Similarity Model is provided. ) >Reporter: kaveh minooie > Fix For: 1.13 > > > this is was what I was getting first: > 16/05/23 13:52:27 INFO mapreduce.Job: map 100% reduce 100% > 16/05/23 13:52:27 INFO mapreduce.Job: Task Id : > attempt_1462499602101_0119_r_00_0, Status : FAILED > Error: Bad return type > Exception Details: > Location: > org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient; > @58: areturn > Reason: > Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, > stack[0]) is not assignable to > 'org/apache/http/impl/client/CloseableHttpClient' (from method signature) > Current Frame: > bci: @58 > flags: { } > locals: { 'org/apache/solr/common/params/SolrParams', > 'org/apache/http/conn/ClientConnectionManager', > 'org/apache/solr/common/params/ModifiableSolrParams', > 'org/apache/http/impl/client/DefaultHttpClient' } > stack: { 'org/apache/http/impl/client/DefaultHttpClient' } > Bytecode: > 0x000: bb00 0359 2ab7 0004 4db2 0005 b900 0601 > 0x010: 0099 001e b200 05bb 0007 59b7 0008 1209 > 0x020: b600 0a2c b600 0bb6 000c b900 0d02 002b > 0x030: b800 104e 2d2c b800 0f2d b0 > Stackmap Table: > append_frame(@47,Object[#143]) > 16/05/23 13:52:28 INFO mapreduce.Job: map 100% reduce 0% > as you can see the failed reducer gets re-spawned. then I found this issue: > https://issues.apache.org/jira/browse/SOLR-7657 and I updated my hadoop > config file. after that, the indexer seems to be able to finish ( I got the > document in the solr, it seems ) but I still get the error message at the end > of the job: > 16/05/23 16:39:26 INFO mapreduce.Job: map 100% reduce 99% > 16/05/23 16:39:44 INFO mapreduce.Job: map 100% reduce 100% > 16/05/23 16:39:57 INFO mapreduce.Job: Job job_1464045047943_0001 completed > successfully > 16/05/23 16:39:58 INFO mapreduce.Job: Counters: 53 > File System Counters > FILE: Number of bytes read=42700154855 > FILE: Number of bytes written=70210771807 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=8699202825 > HDFS: Number of bytes written=0 > HDFS: Number of read operations=537 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=0 > Job Counters > Launched map tasks=134 > Launched reduce tasks=1 > Data-local map tasks=107 > Rack-local map tasks=27 > Total time spent by all maps in occupied slots (ms)=49377664 > Total time spent by all reduces in occupied slots (ms)=32765064 > Total time spent by all map tasks (ms)=3086104 > Total time spent by all reduce tasks (ms)=1365211 > Total vcore-milliseconds taken by all map tasks=3086104 > Total vcore-milliseconds taken by all reduce tasks=1365211 > Total megabyte-milliseconds taken by all map tasks=12640681984 > Total megabyte-milliseconds taken by all reduce tasks=8387856384 > Map-Reduce Framework > Map input records=25305474 > Map output records=25305474 > Map output bytes=27422869763 > Map output materialized bytes=27489888004 > Input split bytes=15225 > Combine input records=0 > Combine output records=0 > Reduce input groups=16061459 > Reduce shuffle bytes=27489888004 > Reduce input records=25305474 > Reduce output records=230 > Spilled
[jira] [Commented] (NUTCH-2271) Solr indexer Failed
[ https://issues.apache.org/jira/browse/NUTCH-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310862#comment-15310862 ] Furkan KAMACI commented on NUTCH-2271: -- [~lewismc] OK, I found it at ivy.xml of related plugin: https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/ivy.xml. We support 5.4.1. So, this is the reason of the error had got here. Issue should be closed for that reason. > Solr indexer Failed > > > Key: NUTCH-2271 > URL: https://issues.apache.org/jira/browse/NUTCH-2271 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: Hadoop 2.7.2 , Solr 6.0.0 , Nutch 1.12 on Single node >Reporter: narendra >Assignee: Furkan KAMACI > > When i run this command > bin/nutch solrindex http://localhost:8983/solr/#/devel1 crawl_Test1/crawldb > -linkdb crawl_Test1/linkdb crawl_Test1/segments/* > 16/05/31 22:21:47 WARN segment.SegmentChecker: The input path at * is not a > segment... skipping > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: starting at 2016-05-31 > 22:21:47 > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: deleting gone documents: > false > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL filtering: false > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL normalizing: false > 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugins: looking in: > /tmp/hadoop-unjar8621976524622577403/classes/plugins > 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugin Auto-activation mode: > [true] > 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Plugins: > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Filter > (urlfilter-regex) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Html Parse Plug-in > (parse-html) > 16/05/31 22:21:47 INFO plugin.PluginRepository: HTTP Framework > (lib-http) > 16/05/31 22:21:47 INFO plugin.PluginRepository: the nutch core > extension points (nutch-extensionpoints) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Basic Indexing Filter > (index-basic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Anchor Indexing Filter > (index-anchor) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Tika Parser Plug-in > (parse-tika) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Basic URL Normalizer > (urlnormalizer-basic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Filter > Framework (lib-regex-filter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Normalizer > (urlnormalizer-regex) > 16/05/31 22:21:47 INFO plugin.PluginRepository: CyberNeko HTML Parser > (lib-nekohtml) > 16/05/31 22:21:47 INFO plugin.PluginRepository: OPIC Scoring Plug-in > (scoring-opic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Pass-through URL > Normalizer (urlnormalizer-pass) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Http Protocol Plug-in > (protocol-http) > 16/05/31 22:21:47 INFO plugin.PluginRepository: SolrIndexWriter > (indexer-solr) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Extension-Points: > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Content Parser > (org.apache.nutch.parse.Parser) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Filter > (org.apache.nutch.net.URLFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: HTML Parse Filter > (org.apache.nutch.parse.HtmlParseFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Normalizer > (org.apache.nutch.net.URLNormalizer) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Protocol > (org.apache.nutch.protocol.Protocol) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Ignore > Exemption Filter (org.apache.nutch.net.URLExemptionFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Index Writer > (org.apache.nutch.indexer.IndexWriter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Segment Merge > Filter (org.apache.nutch.segment.SegmentMergeFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Indexing Filter > (org.apache.nutch.indexer.IndexingFilter) > 16/05/31 22:21:47 INFO indexer.IndexWriters: Adding > org.apache.nutch.indexwriter.solr.SolrIndexWriter > 16/05/31 22:21:47 INFO indexer.IndexingJob: Active IndexWriters : > SOLRIndexWriter > solr.server.url : URL of the SOLR instance > solr.zookeeper.hosts : URL of the Zookeeper quorum > solr.commit.size : buffer size when sending to SOLR (default 1000) > solr.mapping.file : name of the mapping file for fields (default > solrindex-mapping.xml) >
[jira] [Commented] (NUTCH-2268) SolrIndexerJob: java.lang.RuntimeException
[ https://issues.apache.org/jira/browse/NUTCH-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309915#comment-15309915 ] Furkan KAMACI commented on NUTCH-2268: -- [~kadarinaren...@gmail.com] When you tried with Solr 4.10.3 did you changer other environment elements' versions or not? > SolrIndexerJob: java.lang.RuntimeException > -- > > Key: NUTCH-2268 > URL: https://issues.apache.org/jira/browse/NUTCH-2268 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 2.3.1 > Environment: iam using > Hbase V:hbase-0.98.19-hadoop2 > Solr V : 6.0.0 > Nutch : 2.3.1 > java : 8 >Reporter: narendra > Labels: indexing > Original Estimate: 12h > Remaining Estimate: 12h > > Could you please help out of this error > SolrIndexerJob: java.lang.RuntimeException: job > failed:name=apache-nutch-2.3.1.jar > when i run this commend > local/bin/nutch solrindex http://localhost:8983/solr/ -all > Tried with Solr 4.10.3 but same error iam getting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (NUTCH-2268) SolrIndexerJob: java.lang.RuntimeException
[ https://issues.apache.org/jira/browse/NUTCH-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309915#comment-15309915 ] Furkan KAMACI edited comment on NUTCH-2268 at 6/1/16 8:19 AM: -- [~kadarinaren...@gmail.com] when you tried with Solr 4.10.3 did you changer other environment elements' versions or not? was (Author: kamaci): [~kadarinaren...@gmail.com] When you tried with Solr 4.10.3 did you changer other environment elements' versions or not? > SolrIndexerJob: java.lang.RuntimeException > -- > > Key: NUTCH-2268 > URL: https://issues.apache.org/jira/browse/NUTCH-2268 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 2.3.1 > Environment: iam using > Hbase V:hbase-0.98.19-hadoop2 > Solr V : 6.0.0 > Nutch : 2.3.1 > java : 8 >Reporter: narendra > Labels: indexing > Original Estimate: 12h > Remaining Estimate: 12h > > Could you please help out of this error > SolrIndexerJob: java.lang.RuntimeException: job > failed:name=apache-nutch-2.3.1.jar > when i run this commend > local/bin/nutch solrindex http://localhost:8983/solr/ -all > Tried with Solr 4.10.3 but same error iam getting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2270) Solr indexer Failed i
[ https://issues.apache.org/jira/browse/NUTCH-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309907#comment-15309907 ] Furkan KAMACI commented on NUTCH-2270: -- This issue is duplicated of NUTCH-2271 and should be closed. > Solr indexer Failed i > - > > Key: NUTCH-2270 > URL: https://issues.apache.org/jira/browse/NUTCH-2270 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: Hadoop 2.7.2 , Solr 6.0.0 , Nutch 1.12 on Single node >Reporter: narendra > > When i run this command > bin/nutch solrindex http://localhost:8983/solr/#/gettingstarted > crawl_Test1/crawldb -linkdb crawl_Test1/linkdb crawl_Test1/segments/* > 16/05/31 22:21:47 WARN segment.SegmentChecker: The input path at * is not a > segment... skipping > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: starting at 2016-05-31 > 22:21:47 > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: deleting gone documents: > false > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL filtering: false > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL normalizing: false > 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugins: looking in: > /tmp/hadoop-unjar8621976524622577403/classes/plugins > 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugin Auto-activation mode: > [true] > 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Plugins: > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Filter > (urlfilter-regex) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Html Parse Plug-in > (parse-html) > 16/05/31 22:21:47 INFO plugin.PluginRepository: HTTP Framework > (lib-http) > 16/05/31 22:21:47 INFO plugin.PluginRepository: the nutch core > extension points (nutch-extensionpoints) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Basic Indexing Filter > (index-basic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Anchor Indexing Filter > (index-anchor) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Tika Parser Plug-in > (parse-tika) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Basic URL Normalizer > (urlnormalizer-basic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Filter > Framework (lib-regex-filter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Normalizer > (urlnormalizer-regex) > 16/05/31 22:21:47 INFO plugin.PluginRepository: CyberNeko HTML Parser > (lib-nekohtml) > 16/05/31 22:21:47 INFO plugin.PluginRepository: OPIC Scoring Plug-in > (scoring-opic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Pass-through URL > Normalizer (urlnormalizer-pass) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Http Protocol Plug-in > (protocol-http) > 16/05/31 22:21:47 INFO plugin.PluginRepository: SolrIndexWriter > (indexer-solr) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Extension-Points: > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Content Parser > (org.apache.nutch.parse.Parser) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Filter > (org.apache.nutch.net.URLFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: HTML Parse Filter > (org.apache.nutch.parse.HtmlParseFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Normalizer > (org.apache.nutch.net.URLNormalizer) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Protocol > (org.apache.nutch.protocol.Protocol) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Ignore > Exemption Filter (org.apache.nutch.net.URLExemptionFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Index Writer > (org.apache.nutch.indexer.IndexWriter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Segment Merge > Filter (org.apache.nutch.segment.SegmentMergeFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Indexing Filter > (org.apache.nutch.indexer.IndexingFilter) > 16/05/31 22:21:47 INFO indexer.IndexWriters: Adding > org.apache.nutch.indexwriter.solr.SolrIndexWriter > 16/05/31 22:21:47 INFO indexer.IndexingJob: Active IndexWriters : > SOLRIndexWriter > solr.server.url : URL of the SOLR instance > solr.zookeeper.hosts : URL of the Zookeeper quorum > solr.commit.size : buffer size when sending to SOLR (default 1000) > solr.mapping.file : name of the mapping file for fields (default > solrindex-mapping.xml) > solr.auth : use authentication (default false) > solr.auth.username : username for authentication > solr.auth.password : password for authentication > 16/05/31 22:21:47 INFO indexer.IndexerMapReduce:
[jira] [Commented] (NUTCH-2271) Solr indexer Failed
[ https://issues.apache.org/jira/browse/NUTCH-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309899#comment-15309899 ] Furkan KAMACI commented on NUTCH-2271: -- This seems to be related to SOLR-7948. [~lewismc] do we support Solr 6 at this version of Nutch? > Solr indexer Failed > > > Key: NUTCH-2271 > URL: https://issues.apache.org/jira/browse/NUTCH-2271 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: Hadoop 2.7.2 , Solr 6.0.0 , Nutch 1.12 on Single node >Reporter: narendra >Assignee: Furkan KAMACI > > When i run this command > bin/nutch solrindex http://localhost:8983/solr/#/devel1 crawl_Test1/crawldb > -linkdb crawl_Test1/linkdb crawl_Test1/segments/* > 16/05/31 22:21:47 WARN segment.SegmentChecker: The input path at * is not a > segment... skipping > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: starting at 2016-05-31 > 22:21:47 > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: deleting gone documents: > false > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL filtering: false > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL normalizing: false > 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugins: looking in: > /tmp/hadoop-unjar8621976524622577403/classes/plugins > 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugin Auto-activation mode: > [true] > 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Plugins: > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Filter > (urlfilter-regex) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Html Parse Plug-in > (parse-html) > 16/05/31 22:21:47 INFO plugin.PluginRepository: HTTP Framework > (lib-http) > 16/05/31 22:21:47 INFO plugin.PluginRepository: the nutch core > extension points (nutch-extensionpoints) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Basic Indexing Filter > (index-basic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Anchor Indexing Filter > (index-anchor) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Tika Parser Plug-in > (parse-tika) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Basic URL Normalizer > (urlnormalizer-basic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Filter > Framework (lib-regex-filter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Normalizer > (urlnormalizer-regex) > 16/05/31 22:21:47 INFO plugin.PluginRepository: CyberNeko HTML Parser > (lib-nekohtml) > 16/05/31 22:21:47 INFO plugin.PluginRepository: OPIC Scoring Plug-in > (scoring-opic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Pass-through URL > Normalizer (urlnormalizer-pass) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Http Protocol Plug-in > (protocol-http) > 16/05/31 22:21:47 INFO plugin.PluginRepository: SolrIndexWriter > (indexer-solr) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Extension-Points: > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Content Parser > (org.apache.nutch.parse.Parser) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Filter > (org.apache.nutch.net.URLFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: HTML Parse Filter > (org.apache.nutch.parse.HtmlParseFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Normalizer > (org.apache.nutch.net.URLNormalizer) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Protocol > (org.apache.nutch.protocol.Protocol) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Ignore > Exemption Filter (org.apache.nutch.net.URLExemptionFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Index Writer > (org.apache.nutch.indexer.IndexWriter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Segment Merge > Filter (org.apache.nutch.segment.SegmentMergeFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Indexing Filter > (org.apache.nutch.indexer.IndexingFilter) > 16/05/31 22:21:47 INFO indexer.IndexWriters: Adding > org.apache.nutch.indexwriter.solr.SolrIndexWriter > 16/05/31 22:21:47 INFO indexer.IndexingJob: Active IndexWriters : > SOLRIndexWriter > solr.server.url : URL of the SOLR instance > solr.zookeeper.hosts : URL of the Zookeeper quorum > solr.commit.size : buffer size when sending to SOLR (default 1000) > solr.mapping.file : name of the mapping file for fields (default > solrindex-mapping.xml) > solr.auth : use authentication (default false) > solr.auth.username : username for authentication > solr.auth.password : password for
[jira] [Assigned] (NUTCH-2271) Solr indexer Failed
[ https://issues.apache.org/jira/browse/NUTCH-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI reassigned NUTCH-2271: Assignee: Furkan KAMACI > Solr indexer Failed > > > Key: NUTCH-2271 > URL: https://issues.apache.org/jira/browse/NUTCH-2271 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: Hadoop 2.7.2 , Solr 6.0.0 , Nutch 1.12 on Single node >Reporter: narendra >Assignee: Furkan KAMACI > > When i run this command > bin/nutch solrindex http://localhost:8983/solr/#/devel1 crawl_Test1/crawldb > -linkdb crawl_Test1/linkdb crawl_Test1/segments/* > 16/05/31 22:21:47 WARN segment.SegmentChecker: The input path at * is not a > segment... skipping > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: starting at 2016-05-31 > 22:21:47 > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: deleting gone documents: > false > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL filtering: false > 16/05/31 22:21:47 INFO indexer.IndexingJob: Indexer: URL normalizing: false > 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugins: looking in: > /tmp/hadoop-unjar8621976524622577403/classes/plugins > 16/05/31 22:21:47 INFO plugin.PluginRepository: Plugin Auto-activation mode: > [true] > 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Plugins: > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Filter > (urlfilter-regex) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Html Parse Plug-in > (parse-html) > 16/05/31 22:21:47 INFO plugin.PluginRepository: HTTP Framework > (lib-http) > 16/05/31 22:21:47 INFO plugin.PluginRepository: the nutch core > extension points (nutch-extensionpoints) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Basic Indexing Filter > (index-basic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Anchor Indexing Filter > (index-anchor) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Tika Parser Plug-in > (parse-tika) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Basic URL Normalizer > (urlnormalizer-basic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Filter > Framework (lib-regex-filter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Regex URL Normalizer > (urlnormalizer-regex) > 16/05/31 22:21:47 INFO plugin.PluginRepository: CyberNeko HTML Parser > (lib-nekohtml) > 16/05/31 22:21:47 INFO plugin.PluginRepository: OPIC Scoring Plug-in > (scoring-opic) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Pass-through URL > Normalizer (urlnormalizer-pass) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Http Protocol Plug-in > (protocol-http) > 16/05/31 22:21:47 INFO plugin.PluginRepository: SolrIndexWriter > (indexer-solr) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Registered Extension-Points: > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Content Parser > (org.apache.nutch.parse.Parser) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Filter > (org.apache.nutch.net.URLFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: HTML Parse Filter > (org.apache.nutch.parse.HtmlParseFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Normalizer > (org.apache.nutch.net.URLNormalizer) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Protocol > (org.apache.nutch.protocol.Protocol) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch URL Ignore > Exemption Filter (org.apache.nutch.net.URLExemptionFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Index Writer > (org.apache.nutch.indexer.IndexWriter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Segment Merge > Filter (org.apache.nutch.segment.SegmentMergeFilter) > 16/05/31 22:21:47 INFO plugin.PluginRepository: Nutch Indexing Filter > (org.apache.nutch.indexer.IndexingFilter) > 16/05/31 22:21:47 INFO indexer.IndexWriters: Adding > org.apache.nutch.indexwriter.solr.SolrIndexWriter > 16/05/31 22:21:47 INFO indexer.IndexingJob: Active IndexWriters : > SOLRIndexWriter > solr.server.url : URL of the SOLR instance > solr.zookeeper.hosts : URL of the Zookeeper quorum > solr.commit.size : buffer size when sending to SOLR (default 1000) > solr.mapping.file : name of the mapping file for fields (default > solrindex-mapping.xml) > solr.auth : use authentication (default false) > solr.auth.username : username for authentication > solr.auth.password : password for authentication > 16/05/31 22:21:47 INFO indexer.IndexerMapReduce: IndexerMapReduce: crawldb: >
[jira] [Updated] (NUTCH-1800) Documentation for Nutch 1.X REST API
[ https://issues.apache.org/jira/browse/NUTCH-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-1800: - Description: This issue should build on NUTCH-1769 with full Java documentation for all classes in the following packages org.apache.nutch.api.* I am assigning this one to [~fjodor.vershinin] as he is doing an excellent job on the REST API. His UML graphic in [0] and commantary shows that he has a good understanding of the REST API and its functionality. Thank you [~fjodor.vershinin] great work. [0] https://wiki.apache.org/nutch/NutchRESTAPI#UML_Graphic was: This issue should build on NUTCH-1769 with full Java documentation for all classes in the following packages org.apache.nutch.api.* I am assigning this one to [~fjodor.vershinin] as he is doing an excellent job on the REST API. His UML graphic in [0] and commantary shows that he has a goo dunderstanding of the REST API and its functionality. Thank you [~fjodor.vershinin] great work. [0] https://wiki.apache.org/nutch/NutchRESTAPI#UML_Graphic > Documentation for Nutch 1.X REST API > > > Key: NUTCH-1800 > URL: https://issues.apache.org/jira/browse/NUTCH-1800 > Project: Nutch > Issue Type: New Feature > Components: documentation, REST_api >Reporter: Lewis John McGibbney >Assignee: Lewis John McGibbney > Fix For: 1.11 > > Attachments: NUTCH-1800.patch > > > This issue should build on NUTCH-1769 with full Java documentation for all > classes in the following packages > org.apache.nutch.api.* > I am assigning this one to [~fjodor.vershinin] as he is doing an excellent > job on the REST API. His UML graphic in [0] and commantary shows that he has > a good understanding of the REST API and its functionality. > Thank you [~fjodor.vershinin] great work. > [0] https://wiki.apache.org/nutch/NutchRESTAPI#UML_Graphic -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (NUTCH-2266) Fix dead link in build.xml for javadoc
[ https://issues.apache.org/jira/browse/NUTCH-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297121#comment-15297121 ] Furkan KAMACI edited comment on NUTCH-2266 at 5/23/16 9:41 PM: --- javadoc link for lucene is removed and javadoc links for solr-solrj, lucene.core and lucene.analyzers-common are created. was (Author: kamaci): javadoc links for solr-solrj, lucene.core and lucene.analyzers-common are created. > Fix dead link in build.xml for javadoc > -- > > Key: NUTCH-2266 > URL: https://issues.apache.org/jira/browse/NUTCH-2266 > Project: Nutch > Issue Type: Bug > Components: build >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.5 > > > build.xml has a dead link for javadoc.link.lucene and should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2266) Fix dead link in build.xml for javadoc
[ https://issues.apache.org/jira/browse/NUTCH-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297121#comment-15297121 ] Furkan KAMACI commented on NUTCH-2266: -- javadoc links for solr-solrj, lucene.core and lucene.analyzers-common are created. > Fix dead link in build.xml for javadoc > -- > > Key: NUTCH-2266 > URL: https://issues.apache.org/jira/browse/NUTCH-2266 > Project: Nutch > Issue Type: Bug > Components: build >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.5 > > > build.xml has a dead link for javadoc.link.lucene and should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NUTCH-2266) Fix dead link in build.xml for javadoc
Furkan KAMACI created NUTCH-2266: Summary: Fix dead link in build.xml for javadoc Key: NUTCH-2266 URL: https://issues.apache.org/jira/browse/NUTCH-2266 Project: Nutch Issue Type: Bug Components: build Reporter: Furkan KAMACI Assignee: Furkan KAMACI Priority: Minor Fix For: 2.5 build.xml has a dead link for javadoc.link.lucene and should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2089) Move Nutch to compile on JDK 8
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296997#comment-15296997 ] Furkan KAMACI commented on NUTCH-2089: -- Changed javadoc.link.java too and attached the output. Seems same. > Move Nutch to compile on JDK 8 > -- > > Key: NUTCH-2089 > URL: https://issues.apache.org/jira/browse/NUTCH-2089 > Project: Nutch > Issue Type: Bug > Components: build >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI > Fix For: 2.5 > > Attachments: java8output.txt, java8output.txt > > > Public support updates for JDK 1.7 stopped in April of this year. > https://www.java.com/en/download/faq/java_7.xml > In our next release we should shift support to JDK 1.8. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2089) Move Nutch to compile on JDK 8
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2089: - Attachment: java8output.txt > Move Nutch to compile on JDK 8 > -- > > Key: NUTCH-2089 > URL: https://issues.apache.org/jira/browse/NUTCH-2089 > Project: Nutch > Issue Type: Bug > Components: build >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI > Fix For: 2.5 > > Attachments: java8output.txt, java8output.txt > > > Public support updates for JDK 1.7 stopped in April of this year. > https://www.java.com/en/download/faq/java_7.xml > In our next release we should shift support to JDK 1.8. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2089) Move Nutch to compile on JDK 8
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296983#comment-15296983 ] Furkan KAMACI commented on NUTCH-2089: -- I've changed javac.version to 1.8 at build.xml (also, my Java version is 1.8.0_77). debug is on at build.xml. I've attached the output.[~lewismc] how can I see warnings? > Move Nutch to compile on JDK 8 > -- > > Key: NUTCH-2089 > URL: https://issues.apache.org/jira/browse/NUTCH-2089 > Project: Nutch > Issue Type: Bug > Components: build >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI > Fix For: 2.5 > > Attachments: java8output.txt > > > Public support updates for JDK 1.7 stopped in April of this year. > https://www.java.com/en/download/faq/java_7.xml > In our next release we should shift support to JDK 1.8. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2089) Move Nutch to compile on JDK 8
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI updated NUTCH-2089: - Attachment: java8output.txt > Move Nutch to compile on JDK 8 > -- > > Key: NUTCH-2089 > URL: https://issues.apache.org/jira/browse/NUTCH-2089 > Project: Nutch > Issue Type: Bug > Components: build >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI > Fix For: 2.5 > > Attachments: java8output.txt > > > Public support updates for JDK 1.7 stopped in April of this year. > https://www.java.com/en/download/faq/java_7.xml > In our next release we should shift support to JDK 1.8. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (NUTCH-2089) Move Nutch to compile on JDK 8
[ https://issues.apache.org/jira/browse/NUTCH-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI reassigned NUTCH-2089: Assignee: Furkan KAMACI > Move Nutch to compile on JDK 8 > -- > > Key: NUTCH-2089 > URL: https://issues.apache.org/jira/browse/NUTCH-2089 > Project: Nutch > Issue Type: Bug > Components: build >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI > Fix For: 2.5 > > > Public support updates for JDK 1.7 stopped in April of this year. > https://www.java.com/en/download/faq/java_7.xml > In our next release we should shift support to JDK 1.8. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295670#comment-15295670 ] Furkan KAMACI commented on NUTCH-: -- [~abenjell] [~lewismc] I coulnd't reproduce the issue? Could you guide me? > re-fetch deletes all metadata except _csh_ and _rs_ > > > Key: NUTCH- > URL: https://issues.apache.org/jira/browse/NUTCH- > Project: Nutch > Issue Type: Bug > Components: crawldb >Affects Versions: 2.3.1 > Environment: Centos 6, mongodb 2.6 and mongodb 3.0 and > hbase-0.98.8-hadoop2 >Reporter: Adnane B. >Assignee: Furkan KAMACI > Fix For: 2.4 > > Attachments: TestReFetch.java, index.html > > > This problem happens at the the second time I crawl a page > {code} > bin/nutch inject urls/ > bin/nutch generate -topN 1000 > bin/nutch fetch -all > bin/nutch parse -force -all > bin/nutch updatedb -all > {code} > seconde time (re-fetch) : > {code} > bin/nutch generate -topN 1000 --> batchid changes for all existing pages > bin/nutch fetch -all --> *** metadatas are delete for all pages already > crawled ** > bin/nutch parse -force -all > bin/nutch updatedb -all > {code} > I reproduce it with mongodb 2.6, mongodb 3.0, and hbase-0.98.8-hadoop2 > It happens only if the page has not changed > To reproduce easily, please add to nutch-site.xml : > {code} > > db.fetch.interval.default > 60 > The default number of seconds between re-fetches of a page (1 > minute) > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI reassigned NUTCH-: Assignee: Furkan KAMACI (was: Lewis John McGibbney) > re-fetch deletes all metadata except _csh_ and _rs_ > > > Key: NUTCH- > URL: https://issues.apache.org/jira/browse/NUTCH- > Project: Nutch > Issue Type: Bug > Components: crawldb >Affects Versions: 2.3.1 > Environment: Centos 6, mongodb 2.6 and mongodb 3.0 and > hbase-0.98.8-hadoop2 >Reporter: Adnane B. >Assignee: Furkan KAMACI > Fix For: 2.4 > > Attachments: TestReFetch.java, index.html > > > This problem happens at the the second time I crawl a page > {code} > bin/nutch inject urls/ > bin/nutch generate -topN 1000 > bin/nutch fetch -all > bin/nutch parse -force -all > bin/nutch updatedb -all > {code} > seconde time (re-fetch) : > {code} > bin/nutch generate -topN 1000 --> batchid changes for all existing pages > bin/nutch fetch -all --> *** metadatas are delete for all pages already > crawled ** > bin/nutch parse -force -all > bin/nutch updatedb -all > {code} > I reproduce it with mongodb 2.6, mongodb 3.0, and hbase-0.98.8-hadoop2 > It happens only if the page has not changed > To reproduce easily, please add to nutch-site.xml : > {code} > > db.fetch.interval.default > 60 > The default number of seconds between re-fetches of a page (1 > minute) > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (NUTCH-2199) Documentation for Nutch 2.X REST API
[ https://issues.apache.org/jira/browse/NUTCH-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI reassigned NUTCH-2199: Assignee: Furkan KAMACI > Documentation for Nutch 2.X REST API > > > Key: NUTCH-2199 > URL: https://issues.apache.org/jira/browse/NUTCH-2199 > Project: Nutch > Issue Type: New Feature > Components: documentation, REST_api >Affects Versions: 2.3.1 >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.5 > > > The work done on NUTCH-1800 needs to be ported to 2.X branch. This is > trivial, I thought I had already done it but obviously not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (NUTCH-2264) Check Forbidden API's at Build
[ https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Furkan KAMACI reassigned NUTCH-2264: Assignee: Furkan KAMACI > Check Forbidden API's at Build > -- > > Key: NUTCH-2264 > URL: https://issues.apache.org/jira/browse/NUTCH-2264 > Project: Nutch > Issue Type: Task >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > > We should avoid [forbidden > calls|https://github.com/policeman-tools/forbidden-apis/wiki] and check in > the ant build for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)