[ https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-10746: --------------------------- Attachment: HIVE-10746.1.patch Fixing legacy {{Mapred.TextInputFormat}} is fraught with issues. Allow Tez split generation to set a sane default if the min-size is misconfigured during execution. > Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 produces 1-byte FileSplits from > TextInputFormat > ---------------------------------------------------------------------------------- > > Key: HIVE-10746 > URL: https://issues.apache.org/jira/browse/HIVE-10746 > Project: Hive > Issue Type: Bug > Components: Hive, Tez > Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1 > Reporter: Greg Senia > Assignee: Gopal V > Priority: Critical > Attachments: HIVE-10746.1.patch, slow_query_output.zip > > > The following query: "SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount > FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id;" runs > consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to > run this same query against Tez as the execution engine it consistently runs > for over 300-500 seconds this seems extremely long. This is a basic external > table delimited by tabs and is a single file in a folder. In Hive 0.13 this > query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now > Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an > execution engine with Single or small file tables. I can attach further logs > if someone needs them for deeper analysis. > HDFS Output: > hadoop fs -ls /example_dw/crc/arsn > Found 2 items > -rwxr-x--- 6 loaduser hadoopusers 0 2015-05-17 20:03 > /example_dw/crc/arsn/_SUCCESS > -rwxr-x--- 6 loaduser hadoopusers 3883880 2015-05-17 20:03 > /example_dw/crc/arsn/part-m-00000 > Hive Table Describe: > hive> describe formatted crc_arsn; > OK > # col_name data_type comment > > arsn_cd string > clmlvl_cd string > arclss_cd string > arclssg_cd string > arsn_prcsr_rmk_ind string > arsn_mbr_rspns_ind string > savtyp_cd string > arsn_eff_dt string > arsn_exp_dt string > arsn_pstd_dts string > arsn_lstupd_dts string > arsn_updrsn_txt string > appl_user_id string > arsntyp_cd string > pre_d_indicator string > arsn_display_txt string > arstat_cd string > arsn_tracking_no string > arsn_cstspcfc_ind string > arsn_mstr_rcrd_ind string > state_specific_ind string > region_specific_in string > arsn_dpndnt_cd string > unit_adjustment_in string > arsn_mbr_only_ind string > arsn_qrmb_ind string > > # Detailed Table Information > Database: adw > Owner: loadu...@exa.example.com > CreateTime: Mon Apr 28 13:28:05 EDT 2014 > LastAccessTime: UNKNOWN > Protect Mode: None > Retention: 0 > Location: hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn > > Table Type: EXTERNAL_TABLE > Table Parameters: > EXTERNAL TRUE > transient_lastDdlTime 1398706085 > > # Storage Information > SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > InputFormat: org.apache.hadoop.mapred.TextInputFormat > OutputFormat: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > field.delim \t > line.delim \n > serialization.format \t > Time taken: 1.245 seconds, Fetched: 54 row(s) > Explain Hive 1.2.0 w/Tez: > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > Explain Hive 0.13 w/Tez: > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (SIMPLE_EDGE) > Results: > Hive 1.2.0 w/Spark 1.3.1: > Finished successfully in 7.09 seconds > Hive 1.2.0 w/Mapreduce: > Stage 1: 32 Seconds > Stage 2: 35 Seconds > Hive 1.2.0 w/Tez 0.5.3: > Time taken: 565.025 seconds, Fetched: 11516 row(s) > > Hive 0.13 w/Tez 0.4.0: > Time taken: 13.552 seconds, Fetched: 11516 row(s) > And finally looking at the Dag Attempt that is stuck for 500 seconds or so in > Tez it looks to be stuck running the same method over and over again: > 8 duration=2561 from=org.apache.hadoop.hive.ql.exec.tez.RecordProcessor> > 2015-05-18 19:58:41,719 INFO [TezChild] exec.Utilities: PLAN PATH = > hdfs://xhadnnm1p.example.com:8020/tmp/hive/gss2002/dbc4b0b5-7859-4487-a56d-969440bc5e90/hive_2015-05-18_19-58-25_951_5497535752804149087-1/gss2002/_tez_scratch_dir/4e635121-c4cd-4e3f-b96b-9f08a6a7bf5d/map.xml > 2015-05-18 19:58:41,822 INFO [TezChild] exec.MapOperator: MAP[4]: records > read - 1 > 2015-05-18 19:58:41,835 INFO [TezChild] io.HiveContextAwareRecordReader: > Processing file > hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn/part-m-00000 > 2015-05-18 19:58:41,848 INFO [TezChild] io.HiveContextAwareRecordReader: > Processing file > hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn/part-m-00000 > ...... > 2015-05-18 20:07:46,560 INFO [TezChild] io.HiveContextAwareRecordReader: > Processing file > hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn/part-m-00000 > 2015-05-18 20:07:46,574 INFO [TezChild] io.HiveContextAwareRecordReader: > Processing file > hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn/part-m-00000 > 2015-05-18 20:07:46,587 INFO [TezChild] io.HiveContextAwareRecordReader: > Processing file > hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn/part-m-00000 > 2015-05-18 20:07:46,603 INFO [TezChild] io.HiveContextAwareRecordReader: > Processing file > hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn/part-m-00000 > 2015-05-18 20:07:46,603 INFO [TezChild] log.PerfLogger: </PERFLOG > method=TezRunProcessor start=1431993518764 end=1431994066603 duration=547839 > from=org.apache.hadoop.hive.ql.exec.tez.TezProcessor> > 2015-05-18 20:07:46,603 INFO [TezChild] exec.MapOperator: 4 finished. > closing... > 2015-05-18 20:07:46,603 INFO [TezChild] exec.MapOperator: > RECORDS_IN_Map_1:13440 -- This message was sent by Atlassian JIRA (v6.3.4#6332)