okumin commented on code in PR #5089:
URL: https://github.com/apache/hive/pull/5089#discussion_r1495644638


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -12143,6 +12148,27 @@ samplePredicate, true, new 
SampleDesc(ts.getNumerator(),
 
     Operator output = putOpInsertMap(op, rwsch);
 
+    if (tab.isMaterializedTable()) {
+      final FileSinkOperator source = 
ctx.getMaterializedTableSource(tab.getFullTableName());
+      final Statistics stats = source.getStatistics().clone();
+      final List<ColStatistics> sourceColStatsList = stats.getColumnStats();
+      final List<String> colNames = 
tab.getCols().stream().map(FieldSchema::getName).collect(Collectors.toList());
+      if (sourceColStatsList.size() != colNames.size()) {
+        throw new IllegalStateException(String.format(
+            "The size of col stats must be equal to that of schema. Stats = 
%s, Schema = %s",
+            sourceColStatsList, colNames));
+      }
+      final List<ColStatistics> colStatsList = new 
ArrayList<>(sourceColStatsList.size());
+      for (int i = 0; i < sourceColStatsList.size(); i++) {
+        final ColStatistics colStats = sourceColStatsList.get(i).clone();
+        // FileSinkOperator stores column stats with internal names such as 
"_col1"
+        colStats.setColumnName(colNames.get(i));
+        colStatsList.add(colStats);
+      }
+      stats.setColumnStats(colStatsList);

Review Comment:
   Just because `stats` could become temporarily inconsistent on 
`colStats.setColumnName`. Now, I think it is not a problem and I will remove 
the additional clone.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -8174,6 +8176,9 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
 
     FileSinkOperator fso = (FileSinkOperator) output;
     fso.getConf().setTable(destinationTable);
+    if (destTableIsMaterialization) {
+      ctx.addMaterializedTableSource(destinationTable.getFullTableName(), fso);

Review Comment:
   That sounds cleaner!
   
   > Does the whole FileSinkOperator is needed?
   
   No. I stored the whole operator because stats of FSO are not computed yet at 
L8180. If we push it to the end of the `materializeCTE`, we can simply store 
only Statistics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to