i am trying to read specific urls from the nutch db. i wrote an external Java application (of course with all the import and needed Jars) i rewrote the get function of SegmentReader and use the results map to handle the data
The problem is: i open a file with 3 urls ( that i know they appear in the DB) for each url in the file i try to get its data with this get function, i successfully get the first url's data, but cant get the other 2, the results map doest contain any data. the getMapRecords function is the same as the original one, in it their is a line that check if the reader is empty : if (readers[i].get(key, value) != null) for the second and third url (an so on..) this is always null. i suspect that after reading the first url, something goes wrong and i don't know what... my new get function below.... Any ideas why? Thx. Nadav. *public Map get(final Path segment, final Text key, Writer writer, final Map results, final Configuration conf) throws Exception { ArrayList <Thread> threads = new ArrayList <Thread>(); //System.out.println(key) threads.add(new Thread() { public void run() { try { List res = _getMapRecords(new Path(segment, Content.DIR_NAME), key, conf); results.put("co", res); } catch (Exception e) { e.getMessage(); } } }); threads.add(new Thread() { public void run() { try { List res = _getMapRecords(new Path(segment, CrawlDatum.FETCH_DIR_NAME), key, conf); results.put("fe", res); } catch (Exception e){ e.getMessage(); } } }); threads.add(new Thread() { public void run() { try { List res = _getSeqRecords(new Path(segment, CrawlDatum.PARSE_DIR_NAME), key, conf); results.put("pa", res); } catch (Exception e) { e.getMessage(); } } }); threads.add(new Thread() { public void run() { try { List res = _getSeqRecords(new Path(segment, CrawlDatum.PARSE_DIR_NAME), key, conf); results.put("pa", res); } catch (Exception e) { e.getMessage(); } } }); threads.add(new Thread() { public void run() { try { List res = _getMapRecords(new Path(segment, ParseData.DIR_NAME), key, conf); results.put("pd", res); } catch (Exception e) { e.getMessage(); } } }); threads.add(new Thread() { public void run() { try { List res = _getMapRecords(new Path(segment, ParseText.DIR_NAME), key, conf); results.put("pt",res); } catch (Exception e) { e.getMessage(); } } }); // do the threads work Iterator <Thread> it = threads.iterator(); while ( it.hasNext()) { ((Thread)it.next()).start(); } int cnt = 0; do { try { Thread.sleep(5000); } catch (Exception e){}; it = threads.iterator(); while(it.hasNext()) { if(((Thread)it.next()).isAlive()) { cnt++; } } } while(cnt > 0); /* //TEST res = (List)results.get("co"); writer.write(res.get(0).toString()); res = (List)results.get("pd"); writer.write(res.get(0).toString()); res = (List)results.get("pt"); writer.write(res.get(0).toString()); */ return results; //writer.flush(); }*