[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

ASF GitHub Bot (Jira) Tue, 06 Jul 2021 00:26:10 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375316#comment-17375316
 ]


ASF GitHub Bot commented on HUDI-2107:
--------------------------------------

minihippo commented on a change in pull request #3193:
URL: https://github.com/apache/hudi/pull/3193#discussion_r664301524



##########
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -161,29 +162,21 @@
             .map(instant -> 
fsView.getLatestMergedFileSlicesBeforeOrOn(relPartitionPath, 
instant.getTimestamp()))
             .orElse(Stream.empty());
 
-        // subgroup splits again by file id & match with log files.
-        Map<String, List<HoodieBaseFile>> groupedInputSplits = 
partitionsToParquetSplits.get(partitionPath).stream()
-            .collect(Collectors.groupingBy(file -> 
FSUtils.getFileId(file.getFileStatus().getPath().getName())));
         latestFileSlices.forEach(fileSlice -> {
-          List<HoodieBaseFile> dataFileSplits = 
groupedInputSplits.get(fileSlice.getFileId());
-          dataFileSplits.forEach(split -> {
-            try {
-              List<String> logFilePaths = 
fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
+          List<String> logFilePaths = 
fileSlice.getLogFiles().sorted(HoodieLogFile.getLogFileComparator())
                   .map(logFile -> 
logFile.getPath().toString()).collect(Collectors.toList());
-              resultMap.put(split, logFilePaths);
-            } catch (Exception e) {
-              throw new HoodieException("Error creating hoodie real time split 
", e);
-            }
-          });
+          baseAndLogsList.add(Pair.of(fileSlice.getBaseFile(), logFilePaths));
         });
       } catch (Exception e) {
         throw new HoodieException("Error obtaining data file/log file 
grouping: " + partitionPath, e);
       }
     });
-    return resultMap;
+    return baseAndLogsList;
   }
 
 
+
+

Review comment:
       code style




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Read Log Only MOR Table For Spark
> -----------------------------------------
>
>                 Key: HUDI-2107
>                 URL: https://issues.apache.org/jira/browse/HUDI-2107
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Spark Integration
>            Reporter: pengzhiwei
>            Assignee: pengzhiwei
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> Currently we cannot support read log-only mor table(which is generated by 
> index like InMemeoryIndex, HbaseIndex and FlinkIndex which support indexing 
> log file) for spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2107) Support Read Log Only MOR Table For Spark

Reply via email to