[ 
https://issues.apache.org/jira/browse/MINIFI-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983386#comment-15983386
 ] 

Kevin Doran edited comment on MINIFI-275 at 4/26/17 3:12 PM:
-------------------------------------------------------------

Found the culprit... YamlConfiguration.cpp (I am working from commit 573c511f)

In multiple locations in YamlConfiguation.cpp, logic exists that assume the 
component ID is set when reading the YAML node (ie, it is interpreted as a 
required field). For example, line 84.

Most occurrences of this logic are simple to fix: simply replace that bit of 
code that is reading the id field to generate a new UUID if the id field is not 
present in the config node.

There is one section of this file that will be more complicated to correct, 
which is loading Connections from the configuration YAML. Currently, the logic 
assumes the connection source id and destination id will be set. I wasn't sure 
of the requirements for interpreting configuration for Connections, so I 
referred to the Java MiNiFi implementation. I found the relevant logic there in 
org.apache.nifi.minifi.commons.schema 
(https://github.com/apache/nifi-minifi/tree/master/minifi-commons/minifi-commons-schema/src/main/java/org/apache/nifi/minifi/commons/schema)

In particular, the class ConfigSchemaV1 has some logic that is relevant to this 
ticket. Here is a code snippet:

{code:java}
...
List<ConnectionSchema> connectionSchemas = new ArrayList<>(connections.size());
for (ConnectionSchemaV1 connection : connections) {
    ConnectionSchema convert = connection.convert();
    convert.setId(getUniqueId(ids, convert.getName()));

    String sourceName = connection.getSourceName();
    if (remoteInputPortIds.contains(sourceName)) {
        convert.setSourceId(sourceName);
    } else {
        if (duplicateProcessorNames.contains(sourceName)) {
            problematicDuplicateNames.add(sourceName);
        }
        String sourceId = processorNameToIdMap.get(sourceName);
        if (!StringUtil.isNullOrEmpty(sourceId)) {
            convert.setSourceId(sourceId);
        }
    }

    String destinationName = connection.getDestinationName();
    if (remoteInputPortIds.contains(destinationName)) {
        convert.setDestinationId(destinationName);
    } else {
        if (duplicateProcessorNames.contains(destinationName)) {
            problematicDuplicateNames.add(destinationName);
        }
        String destinationId = processorNameToIdMap.get(destinationName);
        if (!StringUtil.isNullOrEmpty(destinationId)) {
            convert.setDestinationId(destinationId);
        }
    }
    connectionSchemas.add(convert);
}
...
{code}

Essentially, it seems the proper way to handle connections in the YAML config 
is as follows:

* All processors should be already loaded, with ids generated if they were not 
present in the YAML
* From the loaded processors, keep a map of name -> id. Also keep track of 
duplicate names, if any.
* When loading connections, if the source/destination id(s) are not present in 
the connection specification, then attempt to lookup the source id and 
destination id by name from the previously built map. If a src/dest name is in 
the set of duplicate names, then bail with an error as the connection 
configuration is ambiguous.


was (Author: kdoran):
Found the culprit... YamlConfiguration.cpp (I am working from commit 573c511f)

In multiple locations in YamlConfiguation.cpp, logic exists that assume the 
component ID is set when reading the YAML node (ie, it is interpreted as a 
required field). For example, line 84.

Most occurrences of this logic are simple to fix: simply replace that bit of 
code that is reading the id field to generate a new UUID if the id field is not 
present in the config node.

There is one section of this file that will be more complicated to correct, 
which is loading Connections from the configuration YAML. Currently, the logic 
assumes the connection source id and destination id will be set. I wasn't sure 
of the requirements for interpreting configuration for Connections, so I 
referred to the Java MiNiFi implementation. I found the relevant logic there in 
org.apache.nifi.minifi.commons.schema 
(https://github.com/apache/nifi-minifi/tree/master/minifi-commons/minifi-commons-schema/src/main/java/org/apache/nifi/minifi/commons/schema)

In particular, the class ConfigSchemaV1 has some logic that is relevant to this 
ticket. Here is a code snippet:

{code:java}
...
List<ConnectionSchema> connectionSchemas = new ArrayList<>(connections.size());
for (ConnectionSchemaV1 connection : connections) {
    ConnectionSchema convert = connection.convert();
    convert.setId(getUniqueId(ids, convert.getName()));

    String sourceName = connection.getSourceName();
    if (remoteInputPortIds.contains(sourceName)) {
        convert.setSourceId(sourceName);
    } else {
        if (duplicateProcessorNames.contains(sourceName)) {
            problematicDuplicateNames.add(sourceName);
        }
        String sourceId = processorNameToIdMap.get(sourceName);
        if (!StringUtil.isNullOrEmpty(sourceId)) {
            convert.setSourceId(sourceId);
        }
    }

    String destinationName = connection.getDestinationName();
    if (remoteInputPortIds.contains(destinationName)) {
        convert.setDestinationId(destinationName);
    } else {
        if (duplicateProcessorNames.contains(destinationName)) {
            problematicDuplicateNames.add(destinationName);
        }
        String destinationId = processorNameToIdMap.get(destinationName);
        if (!StringUtil.isNullOrEmpty(destinationId)) {
            convert.setDestinationId(destinationId);
        }
    }
    connectionSchemas.add(convert);
}
...
{code}

Essentially, it seems the proper way to handle connections in the YAML config 
is as follows:

* All processors should be already loaded, with ids generated if they were not 
present in the YAML
* From the loaded processors, keep a map of name -> id. Also keep track of 
duplicate names, if any.
* When loading connections, attempt to lookup the source id and destination id 
by name. If a name is in the set of duplicate names, bail with an error as the 
connection configuration is ambiguous.

> Configuration without IDs for components causes exceptions
> ----------------------------------------------------------
>
>                 Key: MINIFI-275
>                 URL: https://issues.apache.org/jira/browse/MINIFI-275
>             Project: Apache NiFi MiNiFi
>          Issue Type: Bug
>          Components: C++, Processing Configuration
>            Reporter: Aldrin Piri
>            Assignee: Kevin Doran
>            Priority: Blocker
>             Fix For: cpp-0.2.0
>
>         Attachments: config.yml
>
>
> One of the changes to how components are handled in C++ introduced a defect 
> into the original construct over the version 1 schema of the YAML.  
> The absence of this ID causes a YAML exception.  
> We should provide handling to support configurations how they were created 
> originally, possibly providing a default/generated ID where one isn't 
> specified, and start laying the foundation for versioned schemas as provided 
> in our Java implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to