sebastian-nagel commented on code in PR #793:
URL: https://github.com/apache/nutch/pull/793#discussion_r1377375552


##########
src/plugin/urlfilter-fast/src/java/org/apache/nutch/urlfilter/fast/FastURLFilter.java:
##########
@@ -181,9 +186,23 @@ public String filter(String url) {
 
   public void reloadRules() throws IOException {
     String fileRules = conf.get(URLFILTER_FAST_FILE);
-    try (Reader reader = conf.getConfResourceAsReader(fileRules)) {
-      reloadRules(reader);
+
+    InputStream is;
+
+    Path fileRulesPath = new Path(fileRules);
+    if (fileRulesPath.toUri().getScheme() != null) {
+      FileSystem fs = fileRulesPath.getFileSystem(conf);
+      is = fs.open(fileRulesPath);
+    }

Review Comment:
   Since we have Hadoop, could try all supported compression codecs (gzip, 
bzip2, zstd, etc.). Something such as (not tested):
   ```java
   CompressionCodec codec = new 
CompressionCodecFactory(conf).getCodec(fileRulesPath);
   if (codec != null) {
      is = codec.createInputStream(is);
   }
   ```
   See 
[cf.getCodec(...)](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/compress/CompressionCodecFactory.html#getCodec-org.apache.hadoop.fs.Path-)
 and 
[codec.createInputStream(...)](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/compress/CompressionCodec.html#createInputStream-java.io.InputStream-).
   
   If the rules file is contained in the job jar, it shouldn't be compressed 
anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to