[ https://issues.apache.org/jira/browse/MINIFI-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983386#comment-15983386 ]
Kevin Doran edited comment on MINIFI-275 at 4/26/17 3:12 PM: ------------------------------------------------------------- Found the culprit... YamlConfiguration.cpp (I am working from commit 573c511f) In multiple locations in YamlConfiguation.cpp, logic exists that assume the component ID is set when reading the YAML node (ie, it is interpreted as a required field). For example, line 84. Most occurrences of this logic are simple to fix: simply replace that bit of code that is reading the id field to generate a new UUID if the id field is not present in the config node. There is one section of this file that will be more complicated to correct, which is loading Connections from the configuration YAML. Currently, the logic assumes the connection source id and destination id will be set. I wasn't sure of the requirements for interpreting configuration for Connections, so I referred to the Java MiNiFi implementation. I found the relevant logic there in org.apache.nifi.minifi.commons.schema (https://github.com/apache/nifi-minifi/tree/master/minifi-commons/minifi-commons-schema/src/main/java/org/apache/nifi/minifi/commons/schema) In particular, the class ConfigSchemaV1 has some logic that is relevant to this ticket. Here is a code snippet: {code:java} ... List<ConnectionSchema> connectionSchemas = new ArrayList<>(connections.size()); for (ConnectionSchemaV1 connection : connections) { ConnectionSchema convert = connection.convert(); convert.setId(getUniqueId(ids, convert.getName())); String sourceName = connection.getSourceName(); if (remoteInputPortIds.contains(sourceName)) { convert.setSourceId(sourceName); } else { if (duplicateProcessorNames.contains(sourceName)) { problematicDuplicateNames.add(sourceName); } String sourceId = processorNameToIdMap.get(sourceName); if (!StringUtil.isNullOrEmpty(sourceId)) { convert.setSourceId(sourceId); } } String destinationName = connection.getDestinationName(); if (remoteInputPortIds.contains(destinationName)) { convert.setDestinationId(destinationName); } else { if (duplicateProcessorNames.contains(destinationName)) { problematicDuplicateNames.add(destinationName); } String destinationId = processorNameToIdMap.get(destinationName); if (!StringUtil.isNullOrEmpty(destinationId)) { convert.setDestinationId(destinationId); } } connectionSchemas.add(convert); } ... {code} Essentially, it seems the proper way to handle connections in the YAML config is as follows: * All processors should be already loaded, with ids generated if they were not present in the YAML * From the loaded processors, keep a map of name -> id. Also keep track of duplicate names, if any. * When loading connections, if the source/destination id(s) are not present in the connection specification, then attempt to lookup the source id and destination id by name from the previously built map. If a src/dest name is in the set of duplicate names, then bail with an error as the connection configuration is ambiguous. was (Author: kdoran): Found the culprit... YamlConfiguration.cpp (I am working from commit 573c511f) In multiple locations in YamlConfiguation.cpp, logic exists that assume the component ID is set when reading the YAML node (ie, it is interpreted as a required field). For example, line 84. Most occurrences of this logic are simple to fix: simply replace that bit of code that is reading the id field to generate a new UUID if the id field is not present in the config node. There is one section of this file that will be more complicated to correct, which is loading Connections from the configuration YAML. Currently, the logic assumes the connection source id and destination id will be set. I wasn't sure of the requirements for interpreting configuration for Connections, so I referred to the Java MiNiFi implementation. I found the relevant logic there in org.apache.nifi.minifi.commons.schema (https://github.com/apache/nifi-minifi/tree/master/minifi-commons/minifi-commons-schema/src/main/java/org/apache/nifi/minifi/commons/schema) In particular, the class ConfigSchemaV1 has some logic that is relevant to this ticket. Here is a code snippet: {code:java} ... List<ConnectionSchema> connectionSchemas = new ArrayList<>(connections.size()); for (ConnectionSchemaV1 connection : connections) { ConnectionSchema convert = connection.convert(); convert.setId(getUniqueId(ids, convert.getName())); String sourceName = connection.getSourceName(); if (remoteInputPortIds.contains(sourceName)) { convert.setSourceId(sourceName); } else { if (duplicateProcessorNames.contains(sourceName)) { problematicDuplicateNames.add(sourceName); } String sourceId = processorNameToIdMap.get(sourceName); if (!StringUtil.isNullOrEmpty(sourceId)) { convert.setSourceId(sourceId); } } String destinationName = connection.getDestinationName(); if (remoteInputPortIds.contains(destinationName)) { convert.setDestinationId(destinationName); } else { if (duplicateProcessorNames.contains(destinationName)) { problematicDuplicateNames.add(destinationName); } String destinationId = processorNameToIdMap.get(destinationName); if (!StringUtil.isNullOrEmpty(destinationId)) { convert.setDestinationId(destinationId); } } connectionSchemas.add(convert); } ... {code} Essentially, it seems the proper way to handle connections in the YAML config is as follows: * All processors should be already loaded, with ids generated if they were not present in the YAML * From the loaded processors, keep a map of name -> id. Also keep track of duplicate names, if any. * When loading connections, attempt to lookup the source id and destination id by name. If a name is in the set of duplicate names, bail with an error as the connection configuration is ambiguous. > Configuration without IDs for components causes exceptions > ---------------------------------------------------------- > > Key: MINIFI-275 > URL: https://issues.apache.org/jira/browse/MINIFI-275 > Project: Apache NiFi MiNiFi > Issue Type: Bug > Components: C++, Processing Configuration > Reporter: Aldrin Piri > Assignee: Kevin Doran > Priority: Blocker > Fix For: cpp-0.2.0 > > Attachments: config.yml > > > One of the changes to how components are handled in C++ introduced a defect > into the original construct over the version 1 schema of the YAML. > The absence of this ID causes a YAML exception. > We should provide handling to support configurations how they were created > originally, possibly providing a default/generated ID where one isn't > specified, and start laying the foundation for versioned schemas as provided > in our Java implementation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)