[jira] [Commented] (HIVE-5393) Hive Multi insert statement is not logging how many rows loaded for individual insert statements

Teddy Choi (JIRA) Wed, 13 Nov 2013 17:14:24 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822056#comment-13822056
 ]


Teddy Choi commented on HIVE-5393:
----------------------------------

I just clicked somewhere of this page once and closed, then it became updated. 
It is my mistake. Sorry.

> Hive Multi insert statement  is not logging how many rows loaded for 
> individual insert statements
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-5393
>                 URL: https://issues.apache.org/jira/browse/HIVE-5393
>             Project: Hive
>          Issue Type: Bug
>          Components: Diagnosability
>    Affects Versions: 0.9.0
>         Environment: Hive 0.9.0
> Cloudera CDH3U4
>            Reporter: Sandip Das
>            Priority: Minor
>
> When I am running multi insert statment in Hive to copy data from a single 
> source table to a target table but into two different partitions, The log is 
> not showing how many rows got loaded into two different partitions by the 
> separate insert statements. It is only showing how many rows loaded for only 
> insert statement not both of them.
> Below is the query:
> =============================================================
> from WBUSINESS_STR
> insert OVERWRITE table WBUSINESS_intraday_30mins PARTITION 
> (data_year='2013',data_month='09',data_day='23',data_hourmin='0000',data_run_id='201309230001')
>  
> select
> year(DB_COMMIT_TIMESTAMP ) as year,
> month(DB_COMMIT_TIMESTAMP) as month,
> day(DB_COMMIT_TIMESTAMP) as day,
> concat(hour(DB_COMMIT_TIMESTAMP),minute(DB_COMMIT_TIMESTAMP)) as hourmin,
> '201309230001' as RUN_ID,
> cast(DB_COMMIT_TIMESTAMP as timestamp),
> DB_ACTION_TYPE,
> cast( RBA as bigint),
> cast( ID as bigint ),
> cast( ACCOUNT_NUMBER as bigint ),
> cast( TIME_CREATED as bigint ),
> cast( FLAGS as bigint ),
> NAME ,
> NAME_UPPER ,
> cast( ADDRESS_ID as bigint ),
> cast( EIN as bigint ),
> DUNS_NUMBER ,
> cast( SIC as bigint ),
> cast( TYPE as bigint ),
> STATE_INC ,
> cast( AVERAGE_TICKET as bigint ),
> SERVICE_EMAIL ,
> SERVICE_PHONE ,
> cast(concat(substr(TIME_ROW_UPDATED,1,10),' ',substr(TIME_ROW_UPDATED,12)) as 
> timestamp),
> DISPUTE_EMAIL ,
> DOING_BUSINESS_AS 
> where 
> cast(DB_COMMIT_TIMESTAMP as timestamp) >= cast('2013-09-23 00:00:00'  as 
> timestamp)
> and cast(DB_COMMIT_TIMESTAMP as timestamp) < cast('2013-09-23 00:30:00' as 
> timestamp)
> insert OVERWRITE table WBUSINESS_intraday_30mins PARTITION 
> (data_year='2013',data_month='09',data_day='22',data_hourmin='2330',data_run_id='201309230001')
>  
> select
> year(DB_COMMIT_TIMESTAMP ) as year,
> month(DB_COMMIT_TIMESTAMP) as month,
> day(DB_COMMIT_TIMESTAMP) as day,
> concat(hour(DB_COMMIT_TIMESTAMP),minute(DB_COMMIT_TIMESTAMP)) as hourmin,
> '201309230001' as RUN_ID,
> cast(DB_COMMIT_TIMESTAMP as timestamp),
> DB_ACTION_TYPE,
> cast( RBA as bigint),
> cast( ID as bigint ),
> cast( ACCOUNT_NUMBER as bigint ),
> cast( TIME_CREATED as bigint ),
> cast( FLAGS as bigint ),
> NAME ,
> NAME_UPPER ,
> cast( ADDRESS_ID as bigint ),
> cast( EIN as bigint ),
> DUNS_NUMBER ,
> cast( SIC as bigint ),
> cast( TYPE as bigint ),
> STATE_INC ,
> cast( AVERAGE_TICKET as bigint ),
> SERVICE_EMAIL ,
> SERVICE_PHONE ,
> cast(concat(substr(TIME_ROW_UPDATED,1,10),' ',substr(TIME_ROW_UPDATED,12)) as 
> timestamp),
> DISPUTE_EMAIL ,
> DOING_BUSINESS_AS 
> where 
> cast(DB_COMMIT_TIMESTAMP as timestamp) >= cast('2013-09-22 23:30:00'  as 
> timestamp)
> and cast(DB_COMMIT_TIMESTAMP as timestamp) < cast('2013-09-23 00:00:00' as 
> timestamp)
> ============================================
> The Hive log says:
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201309261542_0042, Tracking URL = 
> http://dmdevetllvs01:50030/jobdetails.jsp?jobid=job_201309261542_0042
> Kill Command = 
> /dmdev/data01/paypal/HADOOP/node1/hadoop-0.20.2-cdh3u4/bin/hadoop job  
> -Dmapred.job.tracker=dmdevetllvs01:9010 -kill job_201309261542_0042
> Hadoop job information for Stage-2: number of mappers: 1; number of reducers: > 0
> 2013-09-27 16:24:26,542 Stage-2 map = 0%,  reduce = 0%
> 2013-09-27 16:24:28,565 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.52 
> sec
> 2013-09-27 16:24:29,574 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 
> 2.52 sec
> MapReduce Total cumulative CPU time: 2 seconds 520 msec
> Ended Job = job_201309261542_0042
> Ended Job = 730112486, job is filtered out (removed at runtime).
> Ended Job = -507348248, job is filtered out (removed at runtime).
> Moving data to: 
> hdfs://dmdevetllvs01.qa.paypal.com/tmp/hive-sanddas/hive_2013-09-27_16-24-19_784_1282210997007172719/-ext-10000
> Moving data to: 
> hdfs://dmdevetllvs01.qa.paypal.com/tmp/hive-sanddas/hive_2013-09-27_16-24-19_784_1282210997007172719/-ext-10002
> Loading data to table default.wbusiness_intraday_30mins partition 
> (data_year=2013, data_month=09, data_day=22, data_hourmin=2330, 
> data_run_id=201309230001)
> Deleted 
> hdfs://dmdevetllvs01.qa.paypal.com/sys/pp_dm/sanddas/import/site/CONF/WBUSINESS/RCFILE/incremental/data_year=2013/data_month=09/data_day=22/data_hourmin=2330/data_run_id=201309230001
> Partition default.wbusiness_intraday_30mins{data_year=2013, data_month=09, 
> data_day=22, data_hourmin=2330, data_run_id=201309230001} stats: [num_files: 
> 1, num_rows: 0, total_size: 12283, raw_data_size: 0]
> Table default.wbusiness_intraday_30mins stats: [num_partitions: 7, num_files: 
> 7, num_rows: 0, total_size: 62018, raw_data_size: 0]
> Loading data to table default.wbusiness_intraday_30mins partition 
> (data_year=2013, data_month=09, data_day=23, data_hourmin=0000, 
> data_run_id=201309230001)
> Deleted 
> hdfs://dmdevetllvs01.qa.paypal.com/sys/pp_dm/sanddas/import/site/CONF/WBUSINESS/RCFILE/incremental/data_year=2013/data_month=09/data_day=23/data_hourmin=0000/data_run_id=201309230001
> Partition default.wbusiness_intraday_30mins{data_year=2013, data_month=09, 
> data_day=23, data_hourmin=0000, data_run_id=201309230001} stats: [num_files: 
> 1, num_rows: 0, total_size: 1870, raw_data_size: 0]
> Table default.wbusiness_intraday_30mins stats: [num_partitions: 7, num_files: 
> 7, num_rows: 0, total_size: 62018, raw_data_size: 0]
> 141 Rows loaded to wbusiness_intraday_30mins
> MapReduce Jobs Launched: 
> Job 0: Map: 1   Cumulative CPU: 2.52 sec   HDFS Read: 4191 HDFS Write: 14153 
> SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 520 msec
> OK
> Time taken: 10.771 seconds
> ====================================================================
> It is only showing 141 rows loaded for a partition.
> It is not showing another 21 rows loaded in another partition.
> The source table has 161 rows.
> Please let us know if this is normal.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HIVE-5393) Hive Multi insert statement is not logging how many rows loaded for individual insert statements

Reply via email to