NissimShiman commented on code in PR #7781: URL: https://github.com/apache/nifi/pull/7781#discussion_r1342996242
########## nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java: ########## @@ -161,31 +180,42 @@ protected void init(final ProcessorInitializationContext context) { public void setup(final ProcessContext context) { String configBody = context.getProperty(MIME_CONFIG_BODY).getValue(); String configFile = context.getProperty(MIME_CONFIG_FILE).evaluateAttributeExpressions().getValue(); + String configStrategy = context.getProperty(CONFIG_STRATEGY).getValue(); - if (configBody == null && configFile == null){ - this.detector = config.getDetector(); - this.mimeTypes = config.getMimeRepository(); - } else if (configBody != null) { - try { - this.detector = MimeTypesFactory.create(new ByteArrayInputStream(configBody.getBytes())); - this.mimeTypes = (MimeTypes)this.detector; - } catch (Exception e) { - context.yield(); - throw new ProcessException("Failed to load config body", e); - } - + if (configBody == null && configFile == null) { + setDefaultMimeTypes(); + } else if (configStrategy.equals(ADD.getValue())) { + setDefaultMimeTypes(); + setCustomMimeTypes(configBody, configFile, context); } else { - try (final FileInputStream fis = new FileInputStream(configFile); - final InputStream bis = new BufferedInputStream(fis)) { - this.detector = MimeTypesFactory.create(bis); - this.mimeTypes = (MimeTypes)this.detector; - } catch (Exception e) { - context.yield(); - throw new ProcessException("Failed to load config file", e); - } + setCustomMimeTypes(configBody, configFile, context); } } + private void setDefaultMimeTypes() { + this.defaultDetector = config.getDetector(); + this.defaultMimeTypes = config.getMimeRepository(); + } + + private void setCustomMimeTypes(String configBody, String configFile, ProcessContext context) { + try (final InputStream customInputStream = configBody != null ? new ByteArrayInputStream(configBody.getBytes()) : new FileInputStream(configFile)) { + this.customDetector = MimeTypesFactory.create(customInputStream); + this.customMimeTypes = (MimeTypes) this.customDetector; + } catch (Exception e) { + context.yield(); + String configSource = configBody != null ? "body" : "file"; + throw new ProcessException("Failed to load config " + configSource, e); + } + } + + @OnStopped + public void onStopped() { + defaultDetector = null; + defaultMimeTypes = null; + + customDetector = null; + customMimeTypes = null; + } Review Comment: I agree if we go with the MimeTypes implementation change suggestion (i.e. using only on MimeTypes - which will be pending on your feedback). If we go with two (current initial PR implementation) we need it for when processor is shutdown and config strategy changes from Add to Replace. The processor will remember the default MimeTypes and still be using them, when it shouldn't be. ########## nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java: ########## @@ -127,6 +132,16 @@ public class IdentifyMimeType extends AbstractProcessor { .expressionLanguageSupported(ExpressionLanguageScope.NONE) .build(); + public static final PropertyDescriptor CONFIG_STRATEGY = new PropertyDescriptor.Builder() + .displayName("Config Strategy") + .name("config-strategy") + .description("Replace default NiFi MIME Types or add to them. " + + "Only applicable if Config File or Config Body is used.") + .required(false) + .allowableValues(REPLACE, ADD) + .defaultValue(REPLACE.getDisplayName()) Review Comment: ouch. Thank you for catching this. ########## nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java: ########## @@ -127,6 +132,16 @@ public class IdentifyMimeType extends AbstractProcessor { .expressionLanguageSupported(ExpressionLanguageScope.NONE) .build(); + public static final PropertyDescriptor CONFIG_STRATEGY = new PropertyDescriptor.Builder() + .displayName("Config Strategy") + .name("config-strategy") + .description("Replace default NiFi MIME Types or add to them. " Review Comment: I agree ########## nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java: ########## @@ -127,6 +132,16 @@ public class IdentifyMimeType extends AbstractProcessor { .expressionLanguageSupported(ExpressionLanguageScope.NONE) .build(); + public static final PropertyDescriptor CONFIG_STRATEGY = new PropertyDescriptor.Builder() + .displayName("Config Strategy") + .name("config-strategy") + .description("Replace default NiFi MIME Types or add to them. " + + "Only applicable if Config File or Config Body is used.") + .required(false) Review Comment: thank you ########## nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java: ########## @@ -136,8 +151,11 @@ public class IdentifyMimeType extends AbstractProcessor { private List<PropertyDescriptor> properties; private final TikaConfig config; - private Detector detector; - private MimeTypes mimeTypes; + private volatile Detector defaultDetector; + private volatile MimeTypes defaultMimeTypes; + + private volatile Detector customDetector; + private volatile MimeTypes customMimeTypes; Review Comment: I think this can make sense if current MimeTypes implementation stays? (pending your feedback on that). These can change if processor is shutdown/restarted and user changes config strategy from Add to Replace. But this all depends on discussion of using one or multiple MimeTypes, so waiting on that before this can be resolved ########## nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java: ########## @@ -206,34 +236,50 @@ public void onTrigger(final ProcessContext context, final ProcessSession session final ComponentLog logger = getLogger(); final AtomicReference<String> mimeTypeRef = new AtomicReference<>(null); + final AtomicReference<String> extensionRef = new AtomicReference<>(null); final String filename = flowFile.getAttribute(CoreAttributes.FILENAME.key()); session.read(flowFile, new InputStreamCallback() { @Override public void process(final InputStream stream) throws IOException { try (final InputStream in = new BufferedInputStream(stream); final TikaInputStream tikaStream = TikaInputStream.get(in)) { - Metadata metadata = new Metadata(); - if (filename != null && context.getProperty(USE_FILENAME_IN_DETECTION).asBoolean()) { - metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY, filename); + if (customDetector != null) { + detectMimeType(customDetector, customMimeTypes, tikaStream); + } + + String mimeType = mimeTypeRef.get(); + + if (defaultDetector != null && (mimeType == null || mimeType.equals(MediaType.OCTET_STREAM.toString()) || mimeType.equals(MediaType.TEXT_PLAIN.toString()))) { + detectMimeType(defaultDetector, defaultMimeTypes, tikaStream); } - // Get mime type - MediaType mediatype = detector.detect(tikaStream, metadata); - mimeTypeRef.set(mediatype.toString()); } } + + private void detectMimeType(Detector detector, MimeTypes mimeTypes, TikaInputStream tikaStream) throws IOException { + Metadata metadata = new Metadata(); + if (filename != null && context.getProperty(USE_FILENAME_IN_DETECTION).asBoolean()) { + metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY, filename); + } + + MediaType mediatype = detector.detect(tikaStream, metadata); + String extension = ""; + + try { + MimeType mimetype = mimeTypes.forName(mediatype.toString()); + extension = mimetype.getExtension(); + } catch (MimeTypeException ex) { + logger.warn("MIME type extension lookup failed: {}", new Object[]{ex}); Review Comment: done ########## nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java: ########## @@ -161,31 +180,42 @@ protected void init(final ProcessorInitializationContext context) { public void setup(final ProcessContext context) { String configBody = context.getProperty(MIME_CONFIG_BODY).getValue(); String configFile = context.getProperty(MIME_CONFIG_FILE).evaluateAttributeExpressions().getValue(); + String configStrategy = context.getProperty(CONFIG_STRATEGY).getValue(); - if (configBody == null && configFile == null){ - this.detector = config.getDetector(); - this.mimeTypes = config.getMimeRepository(); - } else if (configBody != null) { - try { - this.detector = MimeTypesFactory.create(new ByteArrayInputStream(configBody.getBytes())); - this.mimeTypes = (MimeTypes)this.detector; - } catch (Exception e) { - context.yield(); - throw new ProcessException("Failed to load config body", e); - } - + if (configBody == null && configFile == null) { + setDefaultMimeTypes(); + } else if (configStrategy.equals(ADD.getValue())) { + setDefaultMimeTypes(); + setCustomMimeTypes(configBody, configFile, context); } else { - try (final FileInputStream fis = new FileInputStream(configFile); - final InputStream bis = new BufferedInputStream(fis)) { - this.detector = MimeTypesFactory.create(bis); - this.mimeTypes = (MimeTypes)this.detector; - } catch (Exception e) { - context.yield(); - throw new ProcessException("Failed to load config file", e); - } + setCustomMimeTypes(configBody, configFile, context); } } + private void setDefaultMimeTypes() { + this.defaultDetector = config.getDetector(); + this.defaultMimeTypes = config.getMimeRepository(); + } + + private void setCustomMimeTypes(String configBody, String configFile, ProcessContext context) { + try (final InputStream customInputStream = configBody != null ? new ByteArrayInputStream(configBody.getBytes()) : new FileInputStream(configFile)) { + this.customDetector = MimeTypesFactory.create(customInputStream); + this.customMimeTypes = (MimeTypes) this.customDetector; + } catch (Exception e) { + context.yield(); + String configSource = configBody != null ? "body" : "file"; + throw new ProcessException("Failed to load config " + configSource, e); + } + } Review Comment: Thank you very much @exceptionfactory for looking at this PR. You are correct that instead of multiple MimeTypes, it is possible to combine multiple entries with one of several create() This works nicely for the common case of defining a new mime type, but there are Add strategy edge cases it does not handle so well. So for instance, it is harder for a user to workaround a tika definition type they don't like. So, for example, there is no graceful way to resolve [NIFI-11084](https://issues.apache.org/jira/browse/NIFI-11084) (i.e. to have the csv definition take precedence) Also, if a user defines a mime type as slightly off than a default mime type, but the rest of the definition remains the same, tika gets confused and throws an exception at runtime. So, for example, if [.customConfig](https://github.com/apache/nifi/blob/rel/nifi-1.23.2/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/resources/TestIdentifyMimeType/.customConfig.xml#L20) was .png instead of .mypng and a .png file was sent through, it would fail at runtime with "org.apache.tika.mime.MimeTypeException: Conflicting extension pattern: .png" This last scenerio isn't so bad, as the user can figure out from the error what is going on, but still somewhat less than ideal. (I found these edge cases out when originally trying to make it work with [here](https://github.com/NissimShiman/nifi/blob/NIFI-11463.origSolution/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java#L197) using this a single MimeTypes) Do we still want to go with a single MimeTypes? The code is cleaner and more straightforward, but I think we lose some usage possibilites. ########## nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java: ########## @@ -99,6 +101,9 @@ } ) public class IdentifyMimeType extends AbstractProcessor { + static final AllowableValue REPLACE = new AllowableValue("replace", "replace", "Use config MIME Types only."); + static final AllowableValue ADD = new AllowableValue("add", "add", Review Comment: good idea -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org