[
https://issues.apache.org/jira/browse/FLINK-36594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
slankka updated FLINK-36594:
----------------------------
Description:
Recently, I'm using HiveCatalog and Hudi sync to HMS.
HiveCatalog can cause subsequently failure of Hive configuration retrieval. In
my case, Hudi cannot get hive-site conf provided in classpath.
I mean, HiveCatalog turn it off by set *HiveConf.hiveSiteLocation* to null,
then any instance of HiveConf will never load hive-site.xml which user put it
on classpath, yarn provided.
HiveCatalog can load hive-site.xml itself without this variable , however the
normal code after that, is still assuming HiveConf 'searches' hive-site.xml
from classpath.
Related change: https://issues.apache.org/jira/browse/FLINK-22092
Only if you addResource explicitly, set it back, or Hive search it from user
uber jar which need another effort.
In addition, the code below are similar at using their private method
*findConfigFile* to search *hiveSiteURL* from classpath
* org.apache.hadoop.hive.conf.HiveConf
* org.apache.hadoop.hive.metastore.conf.MetastoreConf
Conclusion: # HiveConf findConfigFile and cache hiveSiteLocation only once
during class intialization.
# MetastoreConf searches hiveSiteLocation from classpath, some HOME or some
CONF_PATH.
# both HiveConf and MetastoreConf can recognize hive-site.xml from classpath
first level. eg: "lib/hive-site.xml" is invalid.
{code:java}
class org.apache.hadoop.hive.metastore.conf.MetastoreConf
private MetastoreConf() {
throw new RuntimeException("You should never be creating one of these!");
}
public static Configuration newMetastoreConf() {
...
if(hiveSiteURL == null) {
hiveSiteURL = findConfigFile(classLoader, "hive-site.xml");
}
...
}{code}
Example
{code:java}
//at first
HiveConf static initialization code try to search hive-site.xml, and only once.
static {
hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
}{code}
{code:java}
String name = "myhive";
String defaultDatabase = "mydatabase";
String hiveConfDir = "/opt/hive-conf";
HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("myhive", hive);
// set the HiveCatalog as the current catalog of the session
tableEnv.useCatalog("myhive"); {code}
after running code above:
{code:java}
//Another framework who are using hive naturely:
HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class);
// or directly
HiveConf hiveConf = new HiveConf(); {code}
The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause
configuration loading failure.
Example code from HiveSyncConfig of Apache Hudi:
{code:java}
public HiveSyncConfig(Properties props, Configuration hadoopConf) {
super(props, hadoopConf);
HiveConf hiveConf = new HiveConf();
// HiveConf needs to load Hadoop conf to allow instantiation via
AWSGlueClientFactory
hiveConf.addResource(hadoopConf);
setHadoopConf(hiveConf);
validateParameters();
} {code}
The temporary fix of this issue is to search again :)
{code:java}
HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
HiveConf hiveConf = new HiveConf();{code}
was:
Recently, I'm using HiveCatalog and Hudi sync to HMS.
HiveCatalog can cause Hudi cannot get hive-site conf provided in classpath.
HiveCatalog can load hive-site.xml itself without this variable , but the rest
code after that, is still assuming HiveConf 'searches' hive-site.xml from
classpath.
I mean, HiveCatalog turn it off, then any instance of HiveConf will never load
hive-site.xml which user put it on classpath, yarn provided.
Only if you addResource explicitly, set it back, or Hive search it from user
uber jar which need another effort.
Example
{code:java}
//at first
HiveConf static initialization code try to search hive-site.xml, and only once.
static {
hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
}{code}
{code:java}
String name = "myhive";
String defaultDatabase = "mydatabase";
String hiveConfDir = "/opt/hive-conf";
HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("myhive", hive);
// set the HiveCatalog as the current catalog of the session
tableEnv.useCatalog("myhive"); {code}
after running code above:
{code:java}
//Another framework who are using hive naturely:
HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class);
// or directly
HiveConf hiveConf = new HiveConf(); {code}
The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause
configuration loading failure.
Because HiveCatalog changes *HiveConf.hiveSiteLocation* to null , as result of
https://issues.apache.org/jira/browse/FLINK-22092
Example code from HiveSyncConfig of Apache Hudi:
{code:java}
public HiveSyncConfig(Properties props, Configuration hadoopConf) {
super(props, hadoopConf);
HiveConf hiveConf = new HiveConf();
// HiveConf needs to load Hadoop conf to allow instantiation via
AWSGlueClientFactory
hiveConf.addResource(hadoopConf);
setHadoopConf(hiveConf);
validateParameters();
} {code}
The temporary fix of this issue is to search again :)
{code:java}
HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
HiveConf hiveConf = new HiveConf();{code}
> HiveCatalog should set HiveConf.hiveSiteLocation back
> -----------------------------------------------------
>
> Key: FLINK-36594
> URL: https://issues.apache.org/jira/browse/FLINK-36594
> Project: Flink
> Issue Type: Bug
> Components: Connectors / Hive
> Affects Versions: 1.20.1
> Reporter: slankka
> Priority: Minor
> Labels: pull-request-available
>
> Recently, I'm using HiveCatalog and Hudi sync to HMS.
> HiveCatalog can cause subsequently failure of Hive configuration retrieval.
> In my case, Hudi cannot get hive-site conf provided in classpath.
> I mean, HiveCatalog turn it off by set *HiveConf.hiveSiteLocation* to null,
> then any instance of HiveConf will never load hive-site.xml which user put it
> on classpath, yarn provided.
> HiveCatalog can load hive-site.xml itself without this variable , however the
> normal code after that, is still assuming HiveConf 'searches' hive-site.xml
> from classpath.
> Related change: https://issues.apache.org/jira/browse/FLINK-22092
> Only if you addResource explicitly, set it back, or Hive search it from user
> uber jar which need another effort.
>
> In addition, the code below are similar at using their private method
> *findConfigFile* to search *hiveSiteURL* from classpath
> * org.apache.hadoop.hive.conf.HiveConf
> * org.apache.hadoop.hive.metastore.conf.MetastoreConf
>
> Conclusion: # HiveConf findConfigFile and cache hiveSiteLocation only once
> during class intialization.
> # MetastoreConf searches hiveSiteLocation from classpath, some HOME or some
> CONF_PATH.
> # both HiveConf and MetastoreConf can recognize hive-site.xml from classpath
> first level. eg: "lib/hive-site.xml" is invalid.
>
> {code:java}
> class org.apache.hadoop.hive.metastore.conf.MetastoreConf
> private MetastoreConf() {
> throw new RuntimeException("You should never be creating one of these!");
> }
>
> public static Configuration newMetastoreConf() {
> ...
> if(hiveSiteURL == null) {
> hiveSiteURL = findConfigFile(classLoader, "hive-site.xml");
> }
> ...
> }{code}
>
> Example
> {code:java}
> //at first
> HiveConf static initialization code try to search hive-site.xml, and only
> once.
> static {
> hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
> }{code}
>
> {code:java}
> String name = "myhive";
> String defaultDatabase = "mydatabase";
> String hiveConfDir = "/opt/hive-conf";
> HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
> tableEnv.registerCatalog("myhive", hive);
> // set the HiveCatalog as the current catalog of the session
> tableEnv.useCatalog("myhive"); {code}
> after running code above:
> {code:java}
> //Another framework who are using hive naturely:
> HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class);
> // or directly
> HiveConf hiveConf = new HiveConf(); {code}
> The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause
> configuration loading failure.
>
> Example code from HiveSyncConfig of Apache Hudi:
> {code:java}
> public HiveSyncConfig(Properties props, Configuration hadoopConf) {
> super(props, hadoopConf);
> HiveConf hiveConf = new HiveConf();
> // HiveConf needs to load Hadoop conf to allow instantiation via
> AWSGlueClientFactory
> hiveConf.addResource(hadoopConf);
> setHadoopConf(hiveConf);
> validateParameters();
> } {code}
>
> The temporary fix of this issue is to search again :)
> {code:java}
> HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
>
> HiveConf hiveConf = new HiveConf();{code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)