bin/nutch fetch/parse handle crawl/segments directory -----------------------------------------------------
Key: NUTCH-1001 URL: https://issues.apache.org/jira/browse/NUTCH-1001 Project: Nutch Issue Type: Improvement Reporter: Gabriele Kahlout Priority: Minor I'm having issues porting scripts across different systems to support the step of extracting the latest/only segments resulting from the generate phase. Variants include: $ export SEGMENT=crawl/segments/`ls -tr crawl/segments|tail -1` #[1] $ s1=`ls -d crawl/segments/2* | tail -1` #[2] $ segment=`$HADOOP_HOME/bin/hadoop dfs -ls crawl/segments | tail -1 | grep -o [a-zA-Z0-9/\-]* |tail -1` $ segment=`$HADOOP_HOME/bin/hdfs -ls crawl/segments | tail -1 | grep -o [a-zA-Z0-9/\-]* |tail -1` And I'm not sure what windows users would have to do. Some users may also do with: bin/nutch fetch with crawl/segments/2* But I don't see a need in having the user extract/worry-about the latest/only segment, and have it a described step in every nutch tutorial. More over only fetch and parse expect a segment while other commands are fine with the directory of segments. Therefore, I think it's beneficial if fetch and parse also handle directories of segments. [1] http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ [2] http://wiki.apache.org/nutch/NutchTutorial#Command_Line_Searching -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira