[ https://issues.apache.org/jira/browse/NUTCH-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-1570: ---------------------------------------- Description: For some time this issue has been discussed on various lists. When doing the upgrade of the Gora dependencies in NUTCH-1569, I stumbled across a comment within o.a.n.api.DbReader#Iterator {code} public Iterator<Map<String,Object>> iterator(String[] fields, String startKey, String endKey, String batchId) throws Exception { Query<String,WebPage> q = store.newQuery(); String[] qFields = fields; if (fields != null) { HashSet<String> flds = new HashSet<String>(Arrays.asList(fields)); // remove "url" flds.remove("url"); if (flds.size() > 0) { qFields = flds.toArray(new String[flds.size()]); } else { qFields = null; } } q.setFields(qFields); if (startKey != null) { q.setStartKey(startKey); if (endKey != null) { q.setEndKey(endKey); } } Result<String,WebPage> res = store.execute(q); *XXX we should add the filtering capability to Query* return new DbIterator(res, fields, batchId); } {code} I will link this issue to something over on Gora once we get around to the implementation. was: For some time this issue has been discussed on various lists. When doing the upgrade of the Gora dependencies in NUTCH-1569, I stumbled across a comment within o.a.n.api.DbReader#Iterator {code} public Iterator<Map<String,Object>> iterator(String[] fields, String startKey, String endKey, String batchId) throws Exception { Query<String,WebPage> q = store.newQuery(); String[] qFields = fields; if (fields != null) { HashSet<String> flds = new HashSet<String>(Arrays.asList(fields)); // remove "url" flds.remove("url"); if (flds.size() > 0) { qFields = flds.toArray(new String[flds.size()]); } else { qFields = null; } } q.setFields(qFields); if (startKey != null) { q.setStartKey(startKey); if (endKey != null) { q.setEndKey(endKey); } } Result<String,WebPage> res = store.execute(q); * // XXX we should add the filtering capability to Query * return new DbIterator(res, fields, batchId); } {code} I will link this issue to something over on Gora once we get around to the implementation. > Add filtering capability to Datastore Queries > --------------------------------------------- > > Key: NUTCH-1570 > URL: https://issues.apache.org/jira/browse/NUTCH-1570 > Project: Nutch > Issue Type: Bug > Components: storage > Affects Versions: 2.2 > Reporter: Lewis John McGibbney > Fix For: 2.3 > > > For some time this issue has been discussed on various lists. > When doing the upgrade of the Gora dependencies in NUTCH-1569, I stumbled > across a comment within o.a.n.api.DbReader#Iterator > {code} > public Iterator<Map<String,Object>> iterator(String[] fields, String > startKey, String endKey, > String batchId) throws Exception { > Query<String,WebPage> q = store.newQuery(); > String[] qFields = fields; > if (fields != null) { > HashSet<String> flds = new HashSet<String>(Arrays.asList(fields)); > // remove "url" > flds.remove("url"); > if (flds.size() > 0) { > qFields = flds.toArray(new String[flds.size()]); > } else { > qFields = null; > } > } > q.setFields(qFields); > if (startKey != null) { > q.setStartKey(startKey); > if (endKey != null) { > q.setEndKey(endKey); > } > } > Result<String,WebPage> res = store.execute(q); > *XXX we should add the filtering capability to Query* > return new DbIterator(res, fields, batchId); > } > {code} > I will link this issue to something over on Gora once we get around to the > implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira