NissimShiman commented on code in PR #7781:
URL: https://github.com/apache/nifi/pull/7781#discussion_r1342996242


##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -161,31 +180,42 @@ protected void init(final ProcessorInitializationContext 
context) {
     public void setup(final ProcessContext context) {
         String configBody = context.getProperty(MIME_CONFIG_BODY).getValue();
         String configFile = 
context.getProperty(MIME_CONFIG_FILE).evaluateAttributeExpressions().getValue();
+        String configStrategy = 
context.getProperty(CONFIG_STRATEGY).getValue();
 
-        if (configBody == null && configFile == null){
-            this.detector = config.getDetector();
-            this.mimeTypes = config.getMimeRepository();
-        } else if (configBody != null) {
-            try {
-                this.detector = MimeTypesFactory.create(new 
ByteArrayInputStream(configBody.getBytes()));
-                this.mimeTypes = (MimeTypes)this.detector;
-            } catch (Exception e) {
-                context.yield();
-                throw new ProcessException("Failed to load config body", e);
-            }
-
+        if (configBody == null && configFile == null) {
+            setDefaultMimeTypes();
+        } else if (configStrategy.equals(ADD.getValue())) {
+            setDefaultMimeTypes();
+            setCustomMimeTypes(configBody, configFile, context);
         } else {
-            try (final FileInputStream fis = new FileInputStream(configFile);
-                 final InputStream bis = new BufferedInputStream(fis)) {
-                this.detector = MimeTypesFactory.create(bis);
-                this.mimeTypes = (MimeTypes)this.detector;
-            } catch (Exception e) {
-                context.yield();
-                throw new ProcessException("Failed to load config file", e);
-            }
+            setCustomMimeTypes(configBody, configFile, context);
         }
     }
 
+    private void setDefaultMimeTypes() {
+        this.defaultDetector = config.getDetector();
+        this.defaultMimeTypes = config.getMimeRepository();
+    }
+
+    private void setCustomMimeTypes(String configBody, String configFile, 
ProcessContext context) {
+        try (final InputStream customInputStream = configBody != null ? new 
ByteArrayInputStream(configBody.getBytes()) : new FileInputStream(configFile)) {
+            this.customDetector = MimeTypesFactory.create(customInputStream);
+            this.customMimeTypes = (MimeTypes) this.customDetector;
+        } catch (Exception e) {
+            context.yield();
+            String configSource = configBody != null ? "body" : "file";
+            throw new ProcessException("Failed to load config " + 
configSource, e);
+        }
+    }
+
+    @OnStopped
+    public void onStopped() {
+        defaultDetector = null;
+        defaultMimeTypes = null;
+
+        customDetector = null;
+        customMimeTypes = null;
+    }

Review Comment:
   I agree if we go with the MimeTypes implementation change suggestion (i.e. 
using only on MimeTypes - which will be pending on your feedback).
   
   If we go with two (current initial PR implementation) we need it for when 
processor is shutdown and config strategy changes from Add to Replace.  The 
processor will remember the default MimeTypes and still be using them, when it 
shouldn't be.  



##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -127,6 +132,16 @@ public class IdentifyMimeType extends AbstractProcessor {
             .expressionLanguageSupported(ExpressionLanguageScope.NONE)
             .build();
 
+    public static final PropertyDescriptor CONFIG_STRATEGY = new 
PropertyDescriptor.Builder()
+            .displayName("Config Strategy")
+            .name("config-strategy")
+            .description("Replace default NiFi MIME Types or add to them.  "
+                    + "Only applicable if Config File or Config Body is used.")
+            .required(false)
+            .allowableValues(REPLACE, ADD)
+            .defaultValue(REPLACE.getDisplayName())

Review Comment:
   ouch.  Thank you for catching this.



##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -127,6 +132,16 @@ public class IdentifyMimeType extends AbstractProcessor {
             .expressionLanguageSupported(ExpressionLanguageScope.NONE)
             .build();
 
+    public static final PropertyDescriptor CONFIG_STRATEGY = new 
PropertyDescriptor.Builder()
+            .displayName("Config Strategy")
+            .name("config-strategy")
+            .description("Replace default NiFi MIME Types or add to them.  "

Review Comment:
   I agree



##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -127,6 +132,16 @@ public class IdentifyMimeType extends AbstractProcessor {
             .expressionLanguageSupported(ExpressionLanguageScope.NONE)
             .build();
 
+    public static final PropertyDescriptor CONFIG_STRATEGY = new 
PropertyDescriptor.Builder()
+            .displayName("Config Strategy")
+            .name("config-strategy")
+            .description("Replace default NiFi MIME Types or add to them.  "
+                    + "Only applicable if Config File or Config Body is used.")
+            .required(false)

Review Comment:
   thank you



##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -136,8 +151,11 @@ public class IdentifyMimeType extends AbstractProcessor {
     private List<PropertyDescriptor> properties;
 
     private final TikaConfig config;
-    private Detector detector;
-    private MimeTypes mimeTypes;
+    private volatile Detector defaultDetector;
+    private volatile MimeTypes defaultMimeTypes;
+
+    private volatile Detector customDetector;
+    private volatile MimeTypes customMimeTypes;

Review Comment:
   I think this can make sense if current MimeTypes implementation stays? 
(pending your feedback on that).  These can change if processor is 
shutdown/restarted and user changes config strategy from Add to Replace.   
   
   But this all depends on discussion of using one or multiple MimeTypes, so 
waiting on that before this can be resolved



##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -206,34 +236,50 @@ public void onTrigger(final ProcessContext context, final 
ProcessSession session
 
         final ComponentLog logger = getLogger();
         final AtomicReference<String> mimeTypeRef = new 
AtomicReference<>(null);
+        final AtomicReference<String> extensionRef = new 
AtomicReference<>(null);
         final String filename = 
flowFile.getAttribute(CoreAttributes.FILENAME.key());
 
         session.read(flowFile, new InputStreamCallback() {
             @Override
             public void process(final InputStream stream) throws IOException {
                 try (final InputStream in = new BufferedInputStream(stream);
                      final TikaInputStream tikaStream = 
TikaInputStream.get(in)) {
-                    Metadata metadata = new Metadata();
 
-                    if (filename != null && 
context.getProperty(USE_FILENAME_IN_DETECTION).asBoolean()) {
-                        metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY, 
filename);
+                    if (customDetector != null) {
+                        detectMimeType(customDetector, customMimeTypes, 
tikaStream);
+                    }
+
+                    String mimeType = mimeTypeRef.get();
+
+                    if (defaultDetector != null && (mimeType == null || 
mimeType.equals(MediaType.OCTET_STREAM.toString()) || 
mimeType.equals(MediaType.TEXT_PLAIN.toString()))) {
+                        detectMimeType(defaultDetector, defaultMimeTypes, 
tikaStream);
                     }
-                    // Get mime type
-                    MediaType mediatype = detector.detect(tikaStream, 
metadata);
-                    mimeTypeRef.set(mediatype.toString());
                 }
             }
+
+            private void detectMimeType(Detector detector, MimeTypes 
mimeTypes, TikaInputStream tikaStream) throws IOException {
+                Metadata metadata = new Metadata();
+                if (filename != null && 
context.getProperty(USE_FILENAME_IN_DETECTION).asBoolean()) {
+                    metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY, 
filename);
+                }
+
+                MediaType mediatype = detector.detect(tikaStream, metadata);
+                String extension = "";
+
+                try {
+                    MimeType mimetype = 
mimeTypes.forName(mediatype.toString());
+                    extension = mimetype.getExtension();
+                } catch (MimeTypeException ex) {
+                    logger.warn("MIME type extension lookup failed: {}", new 
Object[]{ex});

Review Comment:
   done



##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -161,31 +180,42 @@ protected void init(final ProcessorInitializationContext 
context) {
     public void setup(final ProcessContext context) {
         String configBody = context.getProperty(MIME_CONFIG_BODY).getValue();
         String configFile = 
context.getProperty(MIME_CONFIG_FILE).evaluateAttributeExpressions().getValue();
+        String configStrategy = 
context.getProperty(CONFIG_STRATEGY).getValue();
 
-        if (configBody == null && configFile == null){
-            this.detector = config.getDetector();
-            this.mimeTypes = config.getMimeRepository();
-        } else if (configBody != null) {
-            try {
-                this.detector = MimeTypesFactory.create(new 
ByteArrayInputStream(configBody.getBytes()));
-                this.mimeTypes = (MimeTypes)this.detector;
-            } catch (Exception e) {
-                context.yield();
-                throw new ProcessException("Failed to load config body", e);
-            }
-
+        if (configBody == null && configFile == null) {
+            setDefaultMimeTypes();
+        } else if (configStrategy.equals(ADD.getValue())) {
+            setDefaultMimeTypes();
+            setCustomMimeTypes(configBody, configFile, context);
         } else {
-            try (final FileInputStream fis = new FileInputStream(configFile);
-                 final InputStream bis = new BufferedInputStream(fis)) {
-                this.detector = MimeTypesFactory.create(bis);
-                this.mimeTypes = (MimeTypes)this.detector;
-            } catch (Exception e) {
-                context.yield();
-                throw new ProcessException("Failed to load config file", e);
-            }
+            setCustomMimeTypes(configBody, configFile, context);
         }
     }
 
+    private void setDefaultMimeTypes() {
+        this.defaultDetector = config.getDetector();
+        this.defaultMimeTypes = config.getMimeRepository();
+    }
+
+    private void setCustomMimeTypes(String configBody, String configFile, 
ProcessContext context) {
+        try (final InputStream customInputStream = configBody != null ? new 
ByteArrayInputStream(configBody.getBytes()) : new FileInputStream(configFile)) {
+            this.customDetector = MimeTypesFactory.create(customInputStream);
+            this.customMimeTypes = (MimeTypes) this.customDetector;
+        } catch (Exception e) {
+            context.yield();
+            String configSource = configBody != null ? "body" : "file";
+            throw new ProcessException("Failed to load config " + 
configSource, e);
+        }
+    }

Review Comment:
   Thank you very much @exceptionfactory  for looking at this PR.
   
   You are correct that instead of multiple MimeTypes, it is possible to 
combine multiple entries with one of several create() 
   
   This works nicely for the common case of defining a new mime type, but there 
are Add strategy edge cases it does not handle so well.
   
   So for instance, it is harder for a user to workaround a tika definition 
type they don't like.  So, for example, there is no graceful way to resolve 
[NIFI-11084](https://issues.apache.org/jira/browse/NIFI-11084) (i.e. to have 
the csv definition take precedence)
   
   Also, if a user defines a mime type as slightly off than a default mime 
type, but the rest of the definition remains the same, tika gets confused and 
throws an exception at runtime.   So, for example, if 
[.customConfig](https://github.com/apache/nifi/blob/rel/nifi-1.23.2/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/resources/TestIdentifyMimeType/.customConfig.xml#L20)
 was .png instead of .mypng and a .png file was sent through, it would fail at 
runtime with "org.apache.tika.mime.MimeTypeException: Conflicting extension 
pattern: .png"
   
   This last scenerio isn't so bad, as the user can figure out from the error 
what is going on, but still somewhat less than ideal.
   
   (I found these edge cases out when originally trying to make it work with 
[here](https://github.com/NissimShiman/nifi/blob/NIFI-11463.origSolution/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java#L197)
 using this a single MimeTypes)
   
   Do we still want to go with a single MimeTypes?  The code is cleaner and 
more straightforward, but I think we lose some usage possibilites.



##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -99,6 +101,9 @@
 }
 )
 public class IdentifyMimeType extends AbstractProcessor {
+    static final AllowableValue REPLACE = new AllowableValue("replace", 
"replace", "Use config MIME Types only.");
+    static final AllowableValue ADD = new AllowableValue("add", "add",

Review Comment:
   good idea



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to