[jira] [Commented] (NUTCH-2661) Move TestOutlinks to the proper path
[ https://issues.apache.org/jira/browse/NUTCH-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653876#comment-16653876 ] ASF GitHub Bot commented on NUTCH-2661: --- sebastian-nagel commented on issue #399: NUTCH-2661 Move the TestOutlinks class into the o.a.n.parse path URL: https://github.com/apache/nutch/pull/399#issuecomment-430710838 +1 that's the right location This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Move TestOutlinks to the proper path > > > Key: NUTCH-2661 > URL: https://issues.apache.org/jira/browse/NUTCH-2661 > Project: Nutch > Issue Type: Improvement >Reporter: Jorge Luis Betancourt Gonzalez >Assignee: Jorge Luis Betancourt Gonzalez >Priority: Trivial > Fix For: 1.16 > > > Initially, I placed the {{TestOutlinks}} class in the index-links plugin, > although this was when I found the bug with the {{hashCode}}. Now I realised > that this test is best to have in the {{test/org/apache/nutch/nutch/parse}} > directory. > Even more because since this test is not covering any plugin-specific code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2661) Move TestOutlinks to the proper path
[ https://issues.apache.org/jira/browse/NUTCH-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653781#comment-16653781 ] ASF GitHub Bot commented on NUTCH-2661: --- jorgelbg opened a new pull request #399: NUTCH-2661 Move the TestOutlinks class into the o.a.n.parse path URL: https://github.com/apache/nutch/pull/399 This test covers the specific case of the comparison betwen 2 identical `Outlink` instances. Because this is not `index-links` specific I'm moving the test class into the core parse tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Move TestOutlinks to the proper path > > > Key: NUTCH-2661 > URL: https://issues.apache.org/jira/browse/NUTCH-2661 > Project: Nutch > Issue Type: Improvement >Reporter: Jorge Luis Betancourt Gonzalez >Assignee: Jorge Luis Betancourt Gonzalez >Priority: Trivial > Fix For: 1.16 > > > Initially, I placed the {{TestOutlinks}} class in the index-links plugin, > although this was when I found the bug with the {{hashCode}}. Now I realised > that this test is best to have in the {{test/org/apache/nutch/nutch/parse}} > directory. > Even more because since this test is not covering any plugin-specific code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NUTCH-2661) Move TestOutlinks to the proper path
Jorge Luis Betancourt Gonzalez created NUTCH-2661: - Summary: Move TestOutlinks to the proper path Key: NUTCH-2661 URL: https://issues.apache.org/jira/browse/NUTCH-2661 Project: Nutch Issue Type: Improvement Reporter: Jorge Luis Betancourt Gonzalez Assignee: Jorge Luis Betancourt Gonzalez Fix For: 1.16 Initially, I placed the {{TestOutlinks}} class in the index-links plugin, although this was when I found the bug with the {{hashCode}}. Now I realised that this test is best to have in the {{test/org/apache/nutch/nutch/parse}} directory. Even more because since this test is not covering any plugin-specific code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2658) Add README file to all plugins in src/plugin
[ https://issues.apache.org/jira/browse/NUTCH-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653497#comment-16653497 ] ASF GitHub Bot commented on NUTCH-2658: --- jorgelbg opened a new pull request #398: NUTCH-2658 Add README for the index-links plugin URL: https://github.com/apache/nutch/pull/398 Add a README file for the index-links plugin. At the very least, least this solves part of the issue with users knowing what they need to add to their backend (usually Solr). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add README file to all plugins in src/plugin > > > Key: NUTCH-2658 > URL: https://issues.apache.org/jira/browse/NUTCH-2658 > Project: Nutch > Issue Type: Improvement > Components: documentation, plugin >Reporter: Jorge Luis Betancourt Gonzalez >Priority: Trivial > > Since we've migrated a good portion of our workflow to Github we could > consider adding a {{README.md}} file to the root of each plugin in > {{src/plugins}}. > This is a good place to have plugin-specific documentation. Wich fields the > plugin adds to the indexer, which configuration options, etc. Also, since the > README.md is rendered by Github automatically is a good link to point users. > I think that a good example is the {{indexer-cloudsearch}} plugin, on top of > that it's a good source of information to point users when asking questions > regarding a specific plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2658) Add README file to all plugins in src/plugin
[ https://issues.apache.org/jira/browse/NUTCH-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653495#comment-16653495 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-2658: --- [~wastl-nagel] exactly what I was thinking. Right now in order to configure a given plugin you need to look at the nutch-default.xml to see what options are available, and read the documentation from there. If it's an indexing plugin you need to check the schema, or in the worst case the actual code to figure out what fields are going to be added. I consider that at least these 2 components should be made more visible to the users, the advantage of the README is that lives right next to the code so it's easier to "remember" to update it. [~yossi] I agree that having the documentation also on the Wiki is very helpful and the README it's not intended to replace that. +1 on generating the wiki from the README (or something else) this will at least guarantees that is updated with each release. We can also add a check/step to the release procedure to check if any new plugins have been added and if the README is there. Of course, there is always the risk that the README contains dummy/not useful data. But through PRs we can keep an eye on that. As a side note, I kind of like how elasticsearch has it's documentation versioned and updated per release. Not sure how to integrate this with our wiki. > Add README file to all plugins in src/plugin > > > Key: NUTCH-2658 > URL: https://issues.apache.org/jira/browse/NUTCH-2658 > Project: Nutch > Issue Type: Improvement > Components: documentation, plugin >Reporter: Jorge Luis Betancourt Gonzalez >Priority: Trivial > > Since we've migrated a good portion of our workflow to Github we could > consider adding a {{README.md}} file to the root of each plugin in > {{src/plugins}}. > This is a good place to have plugin-specific documentation. Wich fields the > plugin adds to the indexer, which configuration options, etc. Also, since the > README.md is rendered by Github automatically is a good link to point users. > I think that a good example is the {{indexer-cloudsearch}} plugin, on top of > that it's a good source of information to point users when asking questions > regarding a specific plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2660) Plugin tests not executed
[ https://issues.apache.org/jira/browse/NUTCH-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653467#comment-16653467 ] ASF GitHub Bot commented on NUTCH-2660: --- sebastian-nagel opened a new pull request #397: NUTCH-2660 Plugin tests not executed URL: https://github.com/apache/nutch/pull/397 - add missing unit test packages to plugin build.xml - tests of "headings" plugin depend on "lib-nekohtml" - add "protocol-okhttp" to Javadoc API overview - add missing test packages to ant "eclipse" target This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Plugin tests not executed > - > > Key: NUTCH-2660 > URL: https://issues.apache.org/jira/browse/NUTCH-2660 > Project: Nutch > Issue Type: Improvement > Components: build, test >Affects Versions: 1.15 >Reporter: Sebastian Nagel >Priority: Minor > Fix For: 1.16 > > > The unit tests of the plugins "parse-js", "headings" and "index-jexl-filter" > are not executed during build. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NUTCH-2660) Plugin tests not executed
Sebastian Nagel created NUTCH-2660: -- Summary: Plugin tests not executed Key: NUTCH-2660 URL: https://issues.apache.org/jira/browse/NUTCH-2660 Project: Nutch Issue Type: Improvement Components: build, test Affects Versions: 1.15 Reporter: Sebastian Nagel Fix For: 1.16 The unit tests of the plugins "parse-js", "headings" and "index-jexl-filter" are not executed during build. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2659) Add missing Apache license headers
[ https://issues.apache.org/jira/browse/NUTCH-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653452#comment-16653452 ] ASF GitHub Bot commented on NUTCH-2659: --- sebastian-nagel opened a new pull request #396: NUTCH-2659 Add missing Apache license headers URL: https://github.com/apache/nutch/pull/396 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add missing Apache license headers > -- > > Key: NUTCH-2659 > URL: https://issues.apache.org/jira/browse/NUTCH-2659 > Project: Nutch > Issue Type: Improvement >Affects Versions: 1.15 >Reporter: Sebastian Nagel >Priority: Trivial > Fix For: 1.16 > > > Should add Apache license headers to source files (at least, *.java) - some > files lack the license header. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NUTCH-2659) Add missing Apache license headers
Sebastian Nagel created NUTCH-2659: -- Summary: Add missing Apache license headers Key: NUTCH-2659 URL: https://issues.apache.org/jira/browse/NUTCH-2659 Project: Nutch Issue Type: Improvement Affects Versions: 1.15 Reporter: Sebastian Nagel Fix For: 1.16 Should add Apache license headers to source files (at least, *.java) - some files lack the license header. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2658) Add README file to all plugins in src/plugin
[ https://issues.apache.org/jira/browse/NUTCH-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653368#comment-16653368 ] Yossi Tamari commented on NUTCH-2658: - I disagree regarding putting the documentation in the code. This is not helpful for new users and users who are not Java coders. They can't be expected to navigate to src/plugin/indexer-cloudsearch to find the documentation for that plugin. The README.md files are also less likely to appear high in Google results, compared to the Wiki. The real problem is that the Wiki, and specifically PluginCentral, is not properly maintained. Do you think the README files will be maintained better? Maybe we can add a build step that will copy the information from the README to the Wiki on release? > Add README file to all plugins in src/plugin > > > Key: NUTCH-2658 > URL: https://issues.apache.org/jira/browse/NUTCH-2658 > Project: Nutch > Issue Type: Improvement > Components: documentation, plugin >Reporter: Jorge Luis Betancourt Gonzalez >Priority: Trivial > > Since we've migrated a good portion of our workflow to Github we could > consider adding a {{README.md}} file to the root of each plugin in > {{src/plugins}}. > This is a good place to have plugin-specific documentation. Wich fields the > plugin adds to the indexer, which configuration options, etc. Also, since the > README.md is rendered by Github automatically is a good link to point users. > I think that a good example is the {{indexer-cloudsearch}} plugin, on top of > that it's a good source of information to point users when asking questions > regarding a specific plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2658) Add README file to all plugins in src/plugin
[ https://issues.apache.org/jira/browse/NUTCH-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653353#comment-16653353 ] Sebastian Nagel commented on NUTCH-2658: In general, a good idea to bundle the plugin documentation and make them available under a uniform path. At present, we the documentation is spread over 4 different places: - the Wiki, e.g., https://wiki.apache.org/nutch/IndexReplace - the [API doc|http://nutch.apache.org/apidocs/apidocs-1.15/overview-summary.html] linking to the package.html / package-info.java of the plugin packages. Some plugins provide a usage description their or in the implementing class. - few plugins already have a README.md, e.g., [indexer-cloudsearch|https://github.com/apache/nutch/tree/master/src/plugin/indexer-cloudsearch] - nutch-default.xml for properties In doubt, I would opt for moving documentation to the code because the code is versioned while our Wiki is not, resp. it's difficult to link a Nutch version (eg. 1.14) and the appropriate description. This would be also a good idea for to the tutorial. The drawback - we really need to maintain the READMEs - once released we cannot change the documentation. > Add README file to all plugins in src/plugin > > > Key: NUTCH-2658 > URL: https://issues.apache.org/jira/browse/NUTCH-2658 > Project: Nutch > Issue Type: Improvement > Components: documentation, plugin >Reporter: Jorge Luis Betancourt Gonzalez >Priority: Trivial > > Since we've migrated a good portion of our workflow to Github we could > consider adding a {{README.md}} file to the root of each plugin in > {{src/plugins}}. > This is a good place to have plugin-specific documentation. Wich fields the > plugin adds to the indexer, which configuration options, etc. Also, since the > README.md is rendered by Github automatically is a good link to point users. > I think that a good example is the {{indexer-cloudsearch}} plugin, on top of > that it's a good source of information to point users when asking questions > regarding a specific plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2658) Add README file to all plugins in src/plugin
[ https://issues.apache.org/jira/browse/NUTCH-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653252#comment-16653252 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-2658: --- I'm thinking of having at least 2 general sections: * Configuration: Covers all parameters that are included in the nutch-default.xml (although could be a bit of a repetition) * Fields: Includes information about which fields should be added to your storage backend configuration (if applicable). Including documentation on how to configure Solr fields would be a nice default configuration, although we support different backends. > Add README file to all plugins in src/plugin > > > Key: NUTCH-2658 > URL: https://issues.apache.org/jira/browse/NUTCH-2658 > Project: Nutch > Issue Type: Improvement > Components: documentation, plugin >Reporter: Jorge Luis Betancourt Gonzalez >Priority: Trivial > > Since we've migrated a good portion of our workflow to Github we could > consider adding a {{README.md}} file to the root of each plugin in > {{src/plugins}}. > This is a good place to have plugin-specific documentation. Wich fields the > plugin adds to the indexer, which configuration options, etc. Also, since the > README.md is rendered by Github automatically is a good link to point users. > I think that a good example is the {{indexer-cloudsearch}} plugin, on top of > that it's a good source of information to point users when asking questions > regarding a specific plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Plugin specific documentation
Hi all, I've created an issue [1] with a proposition about improving a bit the documentation for each plugin that is included with Nutch. I would love to get some feedback about the general idea. Best Regards, Jorge [1] https://issues.apache.org/jira/browse/NUTCH-2658
[jira] [Created] (NUTCH-2658) Add README file to all plugins in src/plugin
Jorge Luis Betancourt Gonzalez created NUTCH-2658: - Summary: Add README file to all plugins in src/plugin Key: NUTCH-2658 URL: https://issues.apache.org/jira/browse/NUTCH-2658 Project: Nutch Issue Type: Improvement Components: documentation, plugin Reporter: Jorge Luis Betancourt Gonzalez Since we've migrated a good portion of our workflow to Github we could consider adding a {{README.md}} file to the root of each plugin in {{src/plugins}}. This is a good place to have plugin-specific documentation. Wich fields the plugin adds to the indexer, which configuration options, etc. Also, since the README.md is rendered by Github automatically is a good link to point users. I think that a good example is the {{indexer-cloudsearch}} plugin, on top of that it's a good source of information to point users when asking questions regarding a specific plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)