[ 
https://issues.apache.org/jira/browse/FLINK-36594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slankka updated FLINK-36594:
----------------------------
    Description: 
Recently, I'm using HiveCatalog and Hudi sync to HMS.

HiveCatalog can cause subsequently failure of Hive configuration retrieval. In 
my case, Hudi cannot get hive-site conf provided in classpath. 

I mean, HiveCatalog turn it off by set *HiveConf.hiveSiteLocation* to null, 
then any instance of HiveConf will never load hive-site.xml which user put it 
on classpath, such as yarn provided. 

HiveCatalog can load hive-site.xml itself without this variable , however the 
normal code after that, is still assuming HiveConf 'searches' hive-site.xml 
from classpath. 

Related change:  https://issues.apache.org/jira/browse/FLINK-22092

Only if you addResource explicitly, set it back, or Hive search it from user 
uber jar which need another effort.

My point is, {+}big data developers will be confused about to provide 
core-site.xml, hive-site.xml, hbase-site.xml and so on{+}. On the other side, 
developers of bigdata framework search it from here and there, and could not 
make sure it's right.

AS consequence, user put their xxx-site.xml everywhere:
 # /etc/hive/conf, /etc/hadoop/conf
 # FLINK_HOME/lib, SPARK_HOME/conf
 # yarn.provided.lib.dir ( resource prefix ./lib, ./plugin/ )
 # packed in their uber jar
 # --files of Apache spark, --yarnship hive-site.xml (works)

Due to the difference of deployment: yarn-per-job and yarn-application, the 
main() of their application could run from different places.

The simplist way to provided xxx-site.xml is both client side classpath and 
their container classpath (root path). By the way, if I am cloud infrastructure 
provider, I like to put it on 1. and 2. and 3; if I am flink users, I do not 
trust them, I packed in my jar and ask cloud provider to give me xxx-site.xml.

 

In addition, the code below are similar at using their private method 
*findConfigFile* to search *hiveSiteLocation* from classpath
 * org.apache.hadoop.hive.conf.HiveConf
 * org.apache.hadoop.hive.metastore.conf.MetastoreConf

 
{*}Conclusion{*}:
 # HiveConf findConfigFile and cache hiveSiteLocation only once during class 
intialization.
 # MetastoreConf will searches hiveSiteLocation again even somebody set it to 
null. (It's better)
 # both HiveConf and MetastoreConf can recognize hive-site.xml from classpath 
first level. eg: "lib/hive-site.xml" is invalid.

 
{code:java}
class org.apache.hadoop.hive.metastore.conf.MetastoreConf

private MetastoreConf() {
  throw new RuntimeException("You should never be creating one of these!");
}

 
public static Configuration newMetastoreConf() {
...
  if(hiveSiteURL == null) {
    hiveSiteURL = findConfigFile(classLoader, "hive-site.xml");
  }
...
}{code}
 
{code:java}
class org.apache.hadoop.hive.conf.HiveConf 
//HiveConf static initialization code try to search hive-site.xml, and only 
once.

static {
  hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
}
...

private void initialize(Class<?> cls) {
  ...
  if (hiveSiteURL != null) {
    addResource(hiveSiteURL);
  }
  ...
}{code}
 
{code:java}
String name            = "myhive";
String defaultDatabase = "mydatabase";
String hiveConfDir     = "/opt/hive-conf";

HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("myhive", hive);

// set the HiveCatalog as the current catalog of the session
tableEnv.useCatalog("myhive"); {code}
after running code above:
{code:java}
//Another framework who are using hive naturely:

HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class); 

// or directly

HiveConf hiveConf = new HiveConf(); {code}
The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause 
configuration loading failure.

 

Example code from HiveSyncConfig of Apache Hudi:
{code:java}
public HiveSyncConfig(Properties props, Configuration hadoopConf) {
    super(props, hadoopConf);
    HiveConf hiveConf = new HiveConf();
    // HiveConf needs to load Hadoop conf to allow instantiation via 
AWSGlueClientFactory
    hiveConf.addResource(hadoopConf);
    setHadoopConf(hiveConf);
    validateParameters();
} {code}
 

The temporary fix of this issue is to search again :)
{code:java}
HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
 
HiveConf hiveConf = new HiveConf();{code}
 

 

  was:
Recently, I'm using HiveCatalog and Hudi sync to HMS.

HiveCatalog can cause subsequently failure of Hive configuration retrieval. In 
my case, Hudi cannot get hive-site conf provided in classpath. 

I mean, HiveCatalog turn it off by set *HiveConf.hiveSiteLocation* to null, 
then any instance of HiveConf will never load hive-site.xml which user put it 
on classpath, such as yarn provided. 

HiveCatalog can load hive-site.xml itself without this variable , however the 
normal code after that, is still assuming HiveConf 'searches' hive-site.xml 
from classpath. 

Related change:  https://issues.apache.org/jira/browse/FLINK-22092

Only if you addResource explicitly, set it back, or Hive search it from user 
uber jar which need another effort.

 

In addition, the code below are similar at using their private method 
*findConfigFile* to search *hiveSiteURL* from classpath
 * org.apache.hadoop.hive.conf.HiveConf
 * org.apache.hadoop.hive.metastore.conf.MetastoreConf

 
Conclusion:
 # HiveConf findConfigFile and cache hiveSiteLocation only once during class 
intialization.
 # MetastoreConf will searches hiveSiteLocation again even set it to null. 
(It's better)
 # both HiveConf and MetastoreConf can recognize hive-site.xml from classpath 
first level. eg: "lib/hive-site.xml" is invalid.

 
{code:java}
class org.apache.hadoop.hive.metastore.conf.MetastoreConf

private MetastoreConf() {
  throw new RuntimeException("You should never be creating one of these!");
}

 
public static Configuration newMetastoreConf() {
...
  if(hiveSiteURL == null) {
    hiveSiteURL = findConfigFile(classLoader, "hive-site.xml");
  }
...
}{code}
 
{code:java}
class org.apache.hadoop.hive.conf.HiveConf 
//HiveConf static initialization code try to search hive-site.xml, and only 
once.

static {
  hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
}
...

private void initialize(Class<?> cls) {
  ...
  if (hiveSiteURL != null) {
    addResource(hiveSiteURL);
  }
  ...
}{code}
 
{code:java}
String name            = "myhive";
String defaultDatabase = "mydatabase";
String hiveConfDir     = "/opt/hive-conf";

HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tableEnv.registerCatalog("myhive", hive);

// set the HiveCatalog as the current catalog of the session
tableEnv.useCatalog("myhive"); {code}
after running code above:
{code:java}
//Another framework who are using hive naturely:

HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class); 

// or directly

HiveConf hiveConf = new HiveConf(); {code}
The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause 
configuration loading failure.

 

Example code from HiveSyncConfig of Apache Hudi:
{code:java}
public HiveSyncConfig(Properties props, Configuration hadoopConf) {
    super(props, hadoopConf);
    HiveConf hiveConf = new HiveConf();
    // HiveConf needs to load Hadoop conf to allow instantiation via 
AWSGlueClientFactory
    hiveConf.addResource(hadoopConf);
    setHadoopConf(hiveConf);
    validateParameters();
} {code}
 

The temporary fix of this issue is to search again :)
{code:java}
HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
 
HiveConf hiveConf = new HiveConf();{code}
 

 


> HiveCatalog should set HiveConf.hiveSiteLocation back
> -----------------------------------------------------
>
>                 Key: FLINK-36594
>                 URL: https://issues.apache.org/jira/browse/FLINK-36594
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive
>    Affects Versions: 1.20.1
>            Reporter: slankka
>            Priority: Minor
>              Labels: pull-request-available
>
> Recently, I'm using HiveCatalog and Hudi sync to HMS.
> HiveCatalog can cause subsequently failure of Hive configuration retrieval. 
> In my case, Hudi cannot get hive-site conf provided in classpath. 
> I mean, HiveCatalog turn it off by set *HiveConf.hiveSiteLocation* to null, 
> then any instance of HiveConf will never load hive-site.xml which user put it 
> on classpath, such as yarn provided. 
> HiveCatalog can load hive-site.xml itself without this variable , however the 
> normal code after that, is still assuming HiveConf 'searches' hive-site.xml 
> from classpath. 
> Related change:  https://issues.apache.org/jira/browse/FLINK-22092
> Only if you addResource explicitly, set it back, or Hive search it from user 
> uber jar which need another effort.
> My point is, {+}big data developers will be confused about to provide 
> core-site.xml, hive-site.xml, hbase-site.xml and so on{+}. On the other side, 
> developers of bigdata framework search it from here and there, and could not 
> make sure it's right.
> AS consequence, user put their xxx-site.xml everywhere:
>  # /etc/hive/conf, /etc/hadoop/conf
>  # FLINK_HOME/lib, SPARK_HOME/conf
>  # yarn.provided.lib.dir ( resource prefix ./lib, ./plugin/ )
>  # packed in their uber jar
>  # --files of Apache spark, --yarnship hive-site.xml (works)
> Due to the difference of deployment: yarn-per-job and yarn-application, the 
> main() of their application could run from different places.
> The simplist way to provided xxx-site.xml is both client side classpath and 
> their container classpath (root path). By the way, if I am cloud 
> infrastructure provider, I like to put it on 1. and 2. and 3; if I am flink 
> users, I do not trust them, I packed in my jar and ask cloud provider to give 
> me xxx-site.xml.
>  
> In addition, the code below are similar at using their private method 
> *findConfigFile* to search *hiveSiteLocation* from classpath
>  * org.apache.hadoop.hive.conf.HiveConf
>  * org.apache.hadoop.hive.metastore.conf.MetastoreConf
>  
> {*}Conclusion{*}:
>  # HiveConf findConfigFile and cache hiveSiteLocation only once during class 
> intialization.
>  # MetastoreConf will searches hiveSiteLocation again even somebody set it to 
> null. (It's better)
>  # both HiveConf and MetastoreConf can recognize hive-site.xml from classpath 
> first level. eg: "lib/hive-site.xml" is invalid.
>  
> {code:java}
> class org.apache.hadoop.hive.metastore.conf.MetastoreConf
> private MetastoreConf() {
>   throw new RuntimeException("You should never be creating one of these!");
> }
>  
> public static Configuration newMetastoreConf() {
> ...
>   if(hiveSiteURL == null) {
>     hiveSiteURL = findConfigFile(classLoader, "hive-site.xml");
>   }
> ...
> }{code}
>  
> {code:java}
> class org.apache.hadoop.hive.conf.HiveConf 
> //HiveConf static initialization code try to search hive-site.xml, and only 
> once.
> static {
>   hiveSiteURL = findConfigFile(classLoader, "hive-site.xml", true);
> }
> ...
> private void initialize(Class<?> cls) {
>   ...
>   if (hiveSiteURL != null) {
>     addResource(hiveSiteURL);
>   }
>   ...
> }{code}
>  
> {code:java}
> String name            = "myhive";
> String defaultDatabase = "mydatabase";
> String hiveConfDir     = "/opt/hive-conf";
> HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
> tableEnv.registerCatalog("myhive", hive);
> // set the HiveCatalog as the current catalog of the session
> tableEnv.useCatalog("myhive"); {code}
> after running code above:
> {code:java}
> //Another framework who are using hive naturely:
> HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class); 
> // or directly
> HiveConf hiveConf = new HiveConf(); {code}
> The hiveConf *DOES NOT* load hive-site.xml from classpath, which will cause 
> configuration loading failure.
>  
> Example code from HiveSyncConfig of Apache Hudi:
> {code:java}
> public HiveSyncConfig(Properties props, Configuration hadoopConf) {
>     super(props, hadoopConf);
>     HiveConf hiveConf = new HiveConf();
>     // HiveConf needs to load Hadoop conf to allow instantiation via 
> AWSGlueClientFactory
>     hiveConf.addResource(hadoopConf);
>     setHadoopConf(hiveConf);
>     validateParameters();
> } {code}
>  
> The temporary fix of this issue is to search again :)
> {code:java}
> HiveConf.setHiveSiteLocation(classLoader.getResource(HiveCatalog.HIVE_SITE_FILE));
>  
> HiveConf hiveConf = new HiveConf();{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to