sebastian-nagel commented on code in PR #793: URL: https://github.com/apache/nutch/pull/793#discussion_r1377375552
########## src/plugin/urlfilter-fast/src/java/org/apache/nutch/urlfilter/fast/FastURLFilter.java: ########## @@ -181,9 +186,23 @@ public String filter(String url) { public void reloadRules() throws IOException { String fileRules = conf.get(URLFILTER_FAST_FILE); - try (Reader reader = conf.getConfResourceAsReader(fileRules)) { - reloadRules(reader); + + InputStream is; + + Path fileRulesPath = new Path(fileRules); + if (fileRulesPath.toUri().getScheme() != null) { + FileSystem fs = fileRulesPath.getFileSystem(conf); + is = fs.open(fileRulesPath); + } Review Comment: Since we have Hadoop, could try all supported compression codecs (gzip, bzip2, zstd, etc.). Something such as (not tested): ```java CompressionCodec codec = new CompressionCodecFactory(conf).getCodec(fileRulesPath); if (codec != null) { is = codec.createInputStream(is); } ``` See [cf.getCodec(...)](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/compress/CompressionCodecFactory.html#getCodec-org.apache.hadoop.fs.Path-) and [codec.createInputStream(...)](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/compress/CompressionCodec.html#createInputStream-java.io.InputStream-). If the rules file is contained in the job jar, it shouldn't be compressed anyway. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org