[
https://issues.apache.org/jira/browse/HIVE-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146052#comment-13146052
]
[email protected] commented on HIVE-2472:
-----------------------------------------------------
bq. On 2011-11-07 22:24:59, Ning Zhang wrote:
bq. >
Actually after CTAS the partitions collection wasn't null, whereas it was just
empty (I also commented on it near the end of the diff description). I will do
all other changes requested ASAP, thank you.
- Robert
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2583/#review3089
-----------------------------------------------------------
On 2011-11-03 20:05:20, Robert Surówka wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2583/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-11-03 20:05:20)
bq.
bq.
bq. Review request for Ning Zhang and Kevin Wilfong.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Explanation of how stats for CTAS were added (line numbers may be slightly
off due to repository changes):
bq.
bq.
bq. Because CTAS contains an INSERT, the approach was to reuse as much, from
what is already there for INSERT, as possible.
bq.
bq. There were 2 main issues: to make sure that FileSinkOperators will gather
stats, and that there will be StatsTask that will then aggregate them and store
to Metastore.
bq.
bq. FileSinkOperator gathers stats if conf.isGatherStats (line 576) is true.
It is set to true upon adding StatsTask in GenMRFileSink1 (126) which will
happen if isInsertTable will be true, which is set in 105 (I didn't change
comment since it is still being set due to INSERT OVERWRITE that is just a part
of the CTAS). To make it true, one must set that CTAS contains insert into the
table, add the TableSpec, which was done in SemanticAnalyzer (1051)
(BaseSemanticAnalyzer tableSpec() must had been changed to support
TOK_CREATETABLE).
bq.
bq. Next issue, was to supply to StatsWork (part of StatsTask) information
about the table being created. To do that, database name was added to
CreateTableDesc, and it is set in SemanticAnalyzer (7878). Then this
CreateTableDesc is added to LoadFileDesc (just to get table info) in
SemanticAnalyzer(4000), which then is added to StatsWork in GenMRFileFileSink1
(170). This StatskWork is later used by StatsTask to get the table info.
bq.
bq. Another thing was that StatsTask would be called before the
CreateTableTask. To remedy that, a change in SemanticAnalyzer(7048) was made,
so for CTAS the StatsTask will be moved to be after the crtTblTask.
bq.
bq. Finally in StatsTask, support for the LoadFileDesc was added (which is
present for CTAS). Importantly, line 306 was changed, since for CTAS there was
an empty partitionList, instead of null (this last change took me around 3
hours to find, since this was last place I looked at, when figuring what's
wrong).
bq.
bq.
bq. I noticed that to database.q.out "Cannot get table db1.db1.conflict_name"
in line 1224 was added, but it wasn't present there in previous diff version
that contained exactly same Java code, so I assume it is due to some other work
happening concurrently.
bq.
bq.
bq. This addresses bug HIVE-2472.
bq. https://issues.apache.org/jira/browse/HIVE-2472
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1196269
bq.
trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java
1196269
bq.
trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
1196269
bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
1196269
bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java
1196269
bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java
1196269
bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 1196269
bq. trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/StatsWork.java 1196269
bq. trunk/ql/src/test/results/clientpositive/ctas.q.out 1196269
bq. trunk/ql/src/test/results/clientpositive/database.q.out 1196269
bq. trunk/ql/src/test/results/clientpositive/merge3.q.out 1196269
bq. trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out 1196269
bq. trunk/ql/src/test/results/clientpositive/smb_mapjoin9.q.out 1196269
bq.
bq. Diff: https://reviews.apache.org/r/2583/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. run ant tests with overwrite option, changes to out files are part of the
diff
bq.
bq.
bq. Thanks,
bq.
bq. Robert
bq.
bq.
> Metastore statistics are not being updated for CTAS queries.
> ------------------------------------------------------------
>
> Key: HIVE-2472
> URL: https://issues.apache.org/jira/browse/HIVE-2472
> Project: Hive
> Issue Type: Bug
> Reporter: Kevin Wilfong
> Assignee: Robert Surówka
> Attachments: HIVE-2472.1.patch.txt, HIVE-2472.2.patch,
> HIVE-2472.3.patch
>
>
> We need to add a Statistics task at the end of a CTAS query in order to
> update the metastore statistics for the table being created.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira