[ https://issues.apache.org/jira/browse/NIFI-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355457#comment-16355457 ]
ASF subversion and git services commented on NIFI-4818: ------------------------------------------------------- Commit f16cbd462b8d5bfea2cf4e1d02910f22e77d0354 in nifi's branch refs/heads/master from [~ijokarumawak] [ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=f16cbd4 ] NIFI-4818: Fix transit URL parsing at Hive2JDBC and KafkaTopic for ReportLineageToAtlas - Hive2JDBC: Handle connection parameters and multiple host entries correctly - KafkaTopic: Handle multiple host entries correctly - Avoid potential "IllegalStateException: Duplicate key" exception when NiFiAtlasHook analyzes existing NiFiFlowPath input/output entries - This closes #2435 > Fix transit URL parsing at Hive2JDBC and KafkaTopic for ReportLineageToAtlas > ---------------------------------------------------------------------------- > > Key: NIFI-4818 > URL: https://issues.apache.org/jira/browse/NIFI-4818 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Affects Versions: 1.5.0 > Reporter: Koji Kawamura > Assignee: Koji Kawamura > Priority: Major > > ReportLineageToAtlas parses Hive JDBC connection URLs to get database names. > It works if a connection URL does not have parameters. (e.g. > jdbc:hive2://host:port/dbName) But it reports wrong database name if there > are parameters. E.g. with > jdbc:hive2://host.port/dbName;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2, > the reported database name will be > dbName;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2, > including the connection parameters. > Also, if there are more than one host:port defined, it will not be able to > analyze cluster name from hostnames correctly. > Similarly for Kafka topic, the reporting task uses transit URIs to analyze > hostnames and topic names. It does handle multiple host:port definitions > within a URI, however, current logic only uses the first hostname entry even > if there are multiple ones. For example, with a transit URI, > "PLAINTEXT://0.example.com:6667,1.example.com:6667/topicA", it uses > "0.example.com" to match configured regular expressions to derive a cluster > name. If none of regex matches, then it uses the default cluster name without > looping through all hostnames. It never uses the 2nd or later hostnames to > derive a cluster name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)