[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

GitBox Sun, 18 Apr 2021 20:06:11 -0700


xiarixiaoyao commented on a change in pull request #2722:
URL: https://github.com/apache/hudi/pull/2722#discussion_r615508116




##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##########
@@ -85,12 +85,14 @@ void addProjectionToJobConf(final RealtimeSplit 
realtimeSplit, final JobConf job
     // risk of experiencing race conditions. Hence, we synchronize on the 
JobConf object here. There is negligible
     // latency incurred here due to the synchronization since get record 
reader is called once per spilt before the
     // actual heavy lifting of reading the parquet files happen.
-    if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null) {
+    if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == null
+        || (!realtimeSplit.getDeltaLogPaths().isEmpty() && 
!HoodieRealtimeInputFormatUtils.requiredProjectionFieldsExistInConf(jobConf))) {
       synchronized (jobConf) {
         LOG.info(
             "Before adding Hoodie columns, Projections :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR)
                 + ", Ids :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR));
-        if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == 
null) {
+        if (jobConf.get(HoodieInputFormatUtils.HOODIE_READ_COLUMNS_PROP) == 
null

Review comment:
       Clone the Configuration object can be very expensive。 To avoid 
unexpected performance regressions for workloads， we should not isolation the 
jobconf for different recordreader
   
   i also agree with that revert the 
https://github.com/apache/hudi/pull/2190/files.   however if current query does 
not 
   involve any log files， adding hoodie additional projection columns will lead 
unnecessary io，since we have scanned hoodie additional projection columns .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #2722: [HUDI-1722]hive beeline/spark-sql query specified field on mor table occur NPE

Reply via email to