[jira] [Commented] (NUTCH-1129) Any23 Nutch plugin

2017-09-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153074#comment-16153074
 ] 

ASF GitHub Bot commented on NUTCH-1129:
---

simoncpu commented on issue #205: WIP: NUTCH-1129 microdata for Nutch 1.x
URL: https://github.com/apache/nutch/pull/205#issuecomment-327062894
 
 
   @thilohaas I tested this on a website with Microdata, but it can't index 
anything...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Any23 Nutch plugin
> --
>
> Key: NUTCH-1129
> URL: https://issues.apache.org/jira/browse/NUTCH-1129
> Project: Nutch
>  Issue Type: New Feature
>  Components: parser
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 2.5
>
> Attachments: NUTCH-1129.patch
>
>
> This plugin should build on the Any23 library to provide us with a plugin 
> which extracts RDF data from HTTP and file resources. Although as of writing 
> Any23 not part of the ASF, the project is working towards integration into 
> the Apache Incubator. Once the project proves its value, this would be an 
> excellent addition to the Nutch 1.X codebase. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2017-09-04 Thread Jorge Luis Betancourt Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152957#comment-16152957
 ] 

Jorge Luis Betancourt Gonzalez commented on NUTCH-1480:
---

[~markus17] do you mind taking a look at the linked PR? I think that the PR 
covers more than the original intent of this issue, since you've already worked 
in something similar, I think that your input would be really valuable on this 
case.

> SolrIndexer to write to multiple servers.
> -
>
> Key: NUTCH-1480
> URL: https://issues.apache.org/jira/browse/NUTCH-1480
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adding-support-for-sharding-indexer-for-solr.patch, 
> NUTCH-1480-1.6.1.patch
>
>
> SolrUtils should return an array of SolrServers and read the SolrUrl as a 
> comma delimited list of URL's using Configuration.getString(). SolrWriter 
> should be able to handle this list of SolrServers.
> This is useful if you want to send documents to multiple servers if no 
> replication is available or if you want to send documents to multiple NOCs.
> edit:
> This does not replace NUTCH-1377 but complements it. With NUTCH-1377 this 
> issue allows you to index to multiple SolrCloud clusters at the same time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2017-09-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152885#comment-16152885
 ] 

ASF GitHub Bot commented on NUTCH-1480:
---

odisleysi commented on issue #218: fix for NUTCH-1480 contributed by r0ann3l
URL: https://github.com/apache/nutch/pull/218#issuecomment-327018446
 
 
   I like this idea. I work in a project that needs to save documents in solr 
for searching and elasticsearch for statistics. This solve the problem.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SolrIndexer to write to multiple servers.
> -
>
> Key: NUTCH-1480
> URL: https://issues.apache.org/jira/browse/NUTCH-1480
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Attachments: adding-support-for-sharding-indexer-for-solr.patch, 
> NUTCH-1480-1.6.1.patch
>
>
> SolrUtils should return an array of SolrServers and read the SolrUrl as a 
> comma delimited list of URL's using Configuration.getString(). SolrWriter 
> should be able to handle this list of SolrServers.
> This is useful if you want to send documents to multiple servers if no 
> replication is available or if you want to send documents to multiple NOCs.
> edit:
> This does not replace NUTCH-1377 but complements it. With NUTCH-1377 this 
> issue allows you to index to multiple SolrCloud clusters at the same time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)