[GitHub] [flink] lirui-apache commented on a change in pull request #13434: [FLINK-19292][hive] HiveCatalog should support specifying Hadoop conf dir with configuration

GitBox Sat, 26 Sep 2020 19:59:29 -0700


lirui-apache commented on a change in pull request #13434:
URL: https://github.com/apache/flink/pull/13434#discussion_r495519793




##########
File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java
##########
@@ -421,6 +424,31 @@ public static void checkAcidTable(CatalogTable 
catalogTable, ObjectPath tablePat
                }
        }
 
+       /**
+        * Returns a new Hadoop Configuration object using the path to the 
hadoop conf configured.
+        *
+        * @param hadoopConfDir Hadoop conf directory path.
+        * @return A Hadoop configuration instance.
+        */
+       public static Configuration getHadoopConfiguration(String 
hadoopConfDir) {
+               Configuration hadoopConfiguration = new Configuration();
+               if (new File(hadoopConfDir).exists()) {
+                       if (new File(hadoopConfDir + 
"/core-site.xml").exists()) {
+                               hadoopConfiguration.addResource(new 
Path(hadoopConfDir + "/core-site.xml"));
+                       }
+                       if (new File(hadoopConfDir + 
"/hdfs-default.xml").exists()) {

Review comment:
       Why do we need hdfs-default.xml?

##########
File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java
##########
@@ -421,6 +424,31 @@ public static void checkAcidTable(CatalogTable 
catalogTable, ObjectPath tablePat
                }
        }
 
+       /**
+        * Returns a new Hadoop Configuration object using the path to the 
hadoop conf configured.
+        *
+        * @param hadoopConfDir Hadoop conf directory path.
+        * @return A Hadoop configuration instance.
+        */
+       public static Configuration getHadoopConfiguration(String 
hadoopConfDir) {
+               Configuration hadoopConfiguration = new Configuration();
+               if (new File(hadoopConfDir).exists()) {
+                       if (new File(hadoopConfDir + 
"/core-site.xml").exists()) {
+                               hadoopConfiguration.addResource(new 
Path(hadoopConfDir + "/core-site.xml"));
+                       }
+                       if (new File(hadoopConfDir + 
"/hdfs-default.xml").exists()) {
+                               hadoopConfiguration.addResource(new 
Path(hadoopConfDir + "/hdfs-default.xml"));
+                       }
+                       if (new File(hadoopConfDir + 
"/hdfs-site.xml").exists()) {
+                               hadoopConfiguration.addResource(new 
Path(hadoopConfDir + "/hdfs-site.xml"));
+                       }
+                       if (new File(hadoopConfDir + 
"/mapred-site.xml").exists()) {

Review comment:
       We also need yarn-site.xml, which is needed to generate Parquet splits 
in a kerberized environment.

##########
File path: 
flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java
##########
@@ -196,15 +204,21 @@ private static HiveConf createHiveConf(@Nullable String 
hiveConfDir) {
                }
 
                // create HiveConf from hadoop configuration
-               Configuration hadoopConf = 
HadoopUtils.getHadoopConfiguration(new 
org.apache.flink.configuration.Configuration());
+               Configuration hadoopConf;
 
-               // Add mapred-site.xml. We need to read configurations like 
compression codec.
-               for (String possibleHadoopConfPath : 
HadoopUtils.possibleHadoopConfPaths(new 
org.apache.flink.configuration.Configuration())) {
-                       File mapredSite = new File(new 
File(possibleHadoopConfPath), "mapred-site.xml");
-                       if (mapredSite.exists()) {
-                               hadoopConf.addResource(new 
Path(mapredSite.getAbsolutePath()));
-                               break;
+               if (isNullOrWhitespaceOnly(hadoopConfDir)) {
+                       hadoopConf = HadoopUtils.getHadoopConfiguration(new 
org.apache.flink.configuration.Configuration());

Review comment:
       Let's not rely on `HadoopUtils.getHadoopConfiguration` to get the 
configuration. We can just call `HadoopUtils.possibleHadoopConfPaths` to get 
the paths and load the files by ourselves. And the loading logic should be 
consistent with `HiveTableUtil.getHadoopConfiguration`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] lirui-apache commented on a change in pull request #13434: [FLINK-19292][hive] HiveCatalog should support specifying Hadoop conf dir with configuration

Reply via email to