[jira] [Created] (NUTCH-2185) protocol-soda-consumer plugin

2015-12-13 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2185:
---

 Summary: protocol-soda-consumer plugin
 Key: NUTCH-2185
 URL: https://issues.apache.org/jira/browse/NUTCH-2185
 Project: Nutch
  Issue Type: Bug
  Components: plugin, protocol
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.12


I'm finishing off a Nutch protocol implementation for interacting with the 
popular [Socrata|https://www.socrata.com/] Open Data platform via their 
[soda-java api|https://github.com/socrata/soda-java]. I feel that this would be 
useful for Government and other public sector organizations who make their data 
available through the Socrata platforms so it is my intention to propose it as 
a protocol-soda-consumer plugin for Nutch.
I'll post a patch here by the end of the day. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2015-12-13 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055417#comment-15055417
 ] 

Chris A. Mattmann commented on NUTCH-2184:
--

Nice, bruh

> Enable IndexingJob to function with no crawldb
> --
>
> Key: NUTCH-2184
> URL: https://issues.apache.org/jira/browse/NUTCH-2184
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.12
>
>
> Sometimes when working with distributed team(s), we have found that we can 
> 'loose' data structures which are currently considered as critical e.g. 
> crawldb, linkdb and/or segments.
> In my current scenario I have a requirement to index segment data with no 
> accompanying crawldb or linkdb. 
> Absence of the latter is OK as linkdb is optional however currently in 
> [IndexerMapReduce|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/indexer/IndexerMapReduce.java]
>  crawldb is mandatory. 
> This ticket should enhance the IndexerMapReduce code to support the use case 
> where you ONLY have segments and want to force an index for every record 
> present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)