[ 
https://issues.apache.org/jira/browse/TEZ-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020199#comment-14020199
 ] 

Hitesh Shah edited comment on TEZ-1177 at 6/6/14 7:05 PM:
----------------------------------------------------------

[~dmoore247] To clarify, the above issues did not cause the stall as such. 

Based on the logs, the final reduce stage was still running. The last lines in 
the log when that task was killed show: 

{code}
2014-06-05 09:33:23,157 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1241000000 rows: used memory = 971863576
2014-06-05 09:33:24,099 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1242000000 rows: used memory = 1010035480
2014-06-05 09:33:25,037 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1243000000 rows: used memory = 1048209432
2014-06-05 09:33:25,976 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1244000000 rows: used memory = 1086372240
2014-06-05 09:33:26,916 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1245000000 rows: used memory = 1124548256
2014-06-05 09:33:27,864 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1246000000 rows: used memory = 928894504
2014-06-05 09:33:28,804 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1247000000 rows: used memory = 967054376
2014-06-05 09:33:29,744 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1248000000 rows: used memory = 1005231488
2014-06-05 09:33:30,681 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1249000000 rows: used memory = 1043396296
2014-06-05 09:33:31,622 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1250000000 rows: used memory = 1081562000
2014-06-05 09:33:32,562 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1251000000 rows: used memory = 1119739088
2014-06-05 09:33:33,511 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1252000000 rows: used memory = 924165312
2014-06-05 09:33:34,450 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1253000000 rows: used memory = 962330296
{code}




was (Author: hitesh):
[~dmoore247] To clarify, the above issues did not cause the stall as such. 

Based on the logs, the final reduce stage was running. The last lines in the 
log when that task was killed show: 

{code}
2014-06-05 09:33:23,157 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1241000000 rows: used memory = 971863576
2014-06-05 09:33:24,099 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1242000000 rows: used memory = 1010035480
2014-06-05 09:33:25,037 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1243000000 rows: used memory = 1048209432
2014-06-05 09:33:25,976 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1244000000 rows: used memory = 1086372240
2014-06-05 09:33:26,916 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1245000000 rows: used memory = 1124548256
2014-06-05 09:33:27,864 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1246000000 rows: used memory = 928894504
2014-06-05 09:33:28,804 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1247000000 rows: used memory = 967054376
2014-06-05 09:33:29,744 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1248000000 rows: used memory = 1005231488
2014-06-05 09:33:30,681 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1249000000 rows: used memory = 1043396296
2014-06-05 09:33:31,622 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1250000000 rows: used memory = 1081562000
2014-06-05 09:33:32,562 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1251000000 rows: used memory = 1119739088
2014-06-05 09:33:33,511 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1252000000 rows: used memory = 924165312
2014-06-05 09:33:34,450 INFO [main] 
org.apache.hadoop.hive.ql.exec.tez.RecordProcessor: ExecMapper: processing 
1253000000 rows: used memory = 962330296
{code}



> HIve Tez job stalls out
> -----------------------
>
>                 Key: TEZ-1177
>                 URL: https://issues.apache.org/jira/browse/TEZ-1177
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>         Environment: HDP 2.1 21 data node cluster 128 GB RAM/node
>            Reporter: Douglas Moore
>             Fix For: 0.4.0
>
>         Attachments: task_failed.core-dump.1.txt, task_failed.spill-error.1, 
> task_failed.spill-error.2, task_failed.spill-error.3, tez-jira-issue.tar.gz
>
>
> I'm on HDP 2.1 running Hive 0.13 job that has created a 3 stage job.
> 338 Maps, 45 Reducers, then 1 Reducer
> The job churns through stange 1, long pause, churns through stage 2, longer 
> pause at which point I kill the job.
> In looking through the log, I notice a lot of ShuffleSchedule copy operations 
> and the performance slows to 9MB/sec from some at over 1000 MB/sec. I'm on a 
> very beefy cluster, with high bandwidth network. What's going on here?
> My ultimate goal is to get `select count(distinct ss_ticket_number) from 
> store_sales` to run at a reasonable pace.
> Will attach log files, query, explain plan etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to