[ 
https://issues.apache.org/jira/browse/NIFI-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355457#comment-16355457
 ] 

ASF subversion and git services commented on NIFI-4818:
-------------------------------------------------------

Commit f16cbd462b8d5bfea2cf4e1d02910f22e77d0354 in nifi's branch 
refs/heads/master from [~ijokarumawak]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=f16cbd4 ]

NIFI-4818: Fix transit URL parsing at Hive2JDBC and KafkaTopic for 
ReportLineageToAtlas

- Hive2JDBC: Handle connection parameters and multiple host entries
correctly
- KafkaTopic: Handle multiple host entries correctly
- Avoid potential "IllegalStateException: Duplicate key" exception
when NiFiAtlasHook analyzes existing NiFiFlowPath input/output entries
- This closes #2435


> Fix transit URL parsing at Hive2JDBC and KafkaTopic for ReportLineageToAtlas
> ----------------------------------------------------------------------------
>
>                 Key: NIFI-4818
>                 URL: https://issues.apache.org/jira/browse/NIFI-4818
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.5.0
>            Reporter: Koji Kawamura
>            Assignee: Koji Kawamura
>            Priority: Major
>
> ReportLineageToAtlas parses Hive JDBC connection URLs to get database names. 
> It works if a connection URL does not have parameters. (e.g. 
> jdbc:hive2://host:port/dbName) But it reports wrong database name if there 
> are parameters. E.g. with 
> jdbc:hive2://host.port/dbName;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2,
>  the reported database name will be 
> dbName;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2, 
> including the connection parameters.
> Also, if there are more than one host:port defined, it will not be able to 
> analyze cluster name from hostnames correctly.
> Similarly for Kafka topic, the reporting task uses transit URIs to analyze 
> hostnames and topic names. It does handle multiple host:port definitions 
> within a URI, however, current logic only uses the first hostname entry even 
> if there are multiple ones. For example, with a transit URI, 
> "PLAINTEXT://0.example.com:6667,1.example.com:6667/topicA", it uses 
> "0.example.com" to match configured regular expressions to derive a cluster 
> name. If none of regex matches, then it uses the default cluster name without 
> looping through all hostnames. It never uses the 2nd or later hostnames to 
> derive a cluster name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to