[
https://issues.apache.org/jira/browse/HIVE-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145814#comment-14145814
]
Xuefu Zhang commented on HIVE-8207:
-----------------------------------
+1
> Add .q tests for multi-table insertion [Spark Branch]
> -----------------------------------------------------
>
> Key: HIVE-8207
> URL: https://issues.apache.org/jira/browse/HIVE-8207
> Project: Hive
> Issue Type: Test
> Components: Spark
> Reporter: Chao
> Assignee: Chao
> Attachments: HIVE-8207.1-spark.patch, HIVE-8207.2-spark.patch,
> HIVE-8207.3-spark.patch
>
>
> Now that multi-table insertion is committed to branch, we should enable those
> related qtests.
> Here is a list of qfiles that should be activated (some of them may already
> be activated).
> The list may not be comprehensive.
> {noformat}
> add_part_multiple.q
> auto_smb_mapjoin_14.q
> bucket5.q
> column_access_stats.q
> date_udf.q
> groupby10.q
> groupby11.q
> groupby3_map_multi_distinct.q
> groupby3_map.q
> groupby3_map_skew.q
> groupby3_noskew_multi_distinct.q
> groupby3_noskew.q
> groupby7_map_multi_single_reducer.q
> groupby7_map.q
> groupby7_map_skew.q
> groupby7_noskew_multi_single_reducer.q
> groupby7_noskew.q
> groupby7.q
> groupby8_map.q
> groupby8_map_skew.q
> groupby8_noskew.q
> groupby8.q
> groupby9.q
> groupby_complex_types_multi_single_reducer.q
> groupby_complex_types.q
> groupby_cube1.q
> groupby_map_ppr_multi_distinct.q
> groupby_map_ppr.q
> groupby_multi_insert_common_distinct.q
> groupby_multi_single_reducer2.q
> groupby_multi_single_reducer3.q
> groupby_multi_single_reducer.q
> groupby_position.q
> groupby_ppr.q
> groupby_rollup1.q
> groupby_sort_1_23.q
> groupby_sort_1.q
> groupby_sort_skew_1_23.q
> infer_bucket_sort_multi_insert.q
> innerjoin.q
> input12_hadoop20.q
> input12.q
> input13.q
> input14.q
> input17.q
> input18.q
> input1_limit.q
> input_part2.q
> insert_into3.q
> join_nullsafe.q
> load_dyn_part8.q
> metadata_only_queries_with_filters.q
> multigroupby_singlemr.q
> multi_insert_gby2.q
> multi_insert_gby3.q
> multi_insert_gby.q
> multi_insert_lateral_view.qmulti_insert_move_tasks_share_dependencies.q
> multi_insert.q
> parallel.q
> partition_date2.q
> pcr.q
> ppd_multi_insert.q
> ppd_transform.q
> smb_mapjoin_11.q
> smb_mapjoin_12.q
> smb_mapjoin_13.q
> smb_mapjoin_15.q
> smb_mapjoin_16.q
> stats4.q
> subquery_multiinsert.q
> table_access_keys_stats.q
> tez_dml.q
> udaf_percentile_approx_20.q
> udaf_percentile_approx_23.q
> union17.q
> union18.q
> union19.q
> {noformat}
>
> There are some tests that cannot be enabled right now, due to various reasons:
> 1. ForwardOperator Issue, including
> {noformat}
> groupby7_noskew_multi_single_reducer.q
> groupby8_map.q
> groupby8_map_skew.q
> groupby8_noskew.q
> groupby8.q
> groupby9.q
> groupby10.q
> groupby_multi_insert_common_distinct.q
> union17.q
> {noformat}
> *Reason*: currently, if the node to break in the operator tree is a
> ForwardOperator, we simple do nothing. However, we may have the following
> case:
> {noformat}
> ...
> RS_0
> |
> FOR
> |
> / \
> GBY_1 GBY_2
> | |
> ... ...
> | |
> RS_1 RS_2
> | |
> ... ...
> | |
> FS_1 FS_2
> {noformat}
> which may result to:
> {noformat}
> RW
> / \
> RW RW
> {noformat}
> and because of the issue in HIVE-7731 and HIVE-8118, both downstream branches
> will get duplicated (and same) inputs.
> 2. Stats issue, including:
> {noformat}
> bucket5.q
> infer_bucket_sort_multi_insert.q
> stats4.q
> smb_mapjoin_13.q
> smb_mapjoin_15.q
> {noformat}
> *Reason*: In these tests, I get diff error because {{numRows}} and
> {{rawDataSize}} are -1, but they are expected to be some positive value. I
> don't think this is related to multi-insertion.
> 3. Join/SMB Join Issue, including
> {noformat}
> auto_smb_mapjoin_14.q
> auto_sortmerge_join_13.q
> smb_mapjoin_11.q
> smb_mapjoin_12.q
> smb_mapjoin_13.q
> smb_mapjoin_15.q
> smb_mapjoin_16.q
> {noformat}
> *Reason*: These tests either failed with exception or failed with diff. I
> think it's because SMB Join (HIVE-8202) isn't supported right now.
> 4. Result doesn't match, including
> {noformat}
> groupby3_map_skew.q
> groupby_map_ppr_multi_distinct.q
> groupby_complex_types_multi_single_reducer.q
> groupby_map_ppr.q
> partition_date2.q
> udaf_percentile_approx_23.q
> {noformat}
> *Reason*: The results from these tests are different from MR's. For instance,
> test for groupby3_map_skew.q failed because:
> {noformat}
> < 130091.0 260.182 256.10355987055016 98.0 0.0
> 142.92680950752379 143.06995106518903 20428.07288 20469.0109
> ---
> > 130091.0 260.182 256.10355987055016 98.0 0.0
> > 142.9268095075238 143.06995106518906 20428.07288 20469.0109
> {noformat}
> I don't know why this will happen. But, I think they may not be related to
> multi-insertion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)