[jira] [Updated] (TEZ-1882) Tez UI build does not work on Windows

2015-01-07 Thread Prakash Ramachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Ramachandran updated TEZ-1882:
--
Attachment: TEZ-1882.2.patch

Thanks [~bikassaha], I have removed the batch file. in patch 2.
 changed the path to node as a full path. this works on both windows and *nix

 Tez UI build does not work on Windows
 -

 Key: TEZ-1882
 URL: https://issues.apache.org/jira/browse/TEZ-1882
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Prakash Ramachandran
Priority: Blocker
 Attachments: TEZ-1882.1.patch, TEZ-1882.2.patch


 It fails during Bower install because it cannot launch node/node. After 
 working around that the bower script itself fails because its a bash script 
 and will not run on windows. Specific the following command fails in 
 node_modules\.bin\bower
 basedir=`dirname $0`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1923) FetcherOrderedGrouped can get into infinite loop due to memory pressure

2015-01-07 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created TEZ-1923:
-

 Summary: FetcherOrderedGrouped can get into infinite loop due to 
memory pressure
 Key: TEZ-1923
 URL: https://issues.apache.org/jira/browse/TEZ-1923
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan


- Ran a comparatively large job (temp table creation) at 10 TB scale.
- Turned on intermediate mem-to-mem 
(tez.runtime.shuffle.memory-to-memory.enable=true and 
tez.runtime.shuffle.memory-to-memory.segments=4)
- Some reducers get lots of data and quickly gets into infinite loop

{code}

2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
Status.WAIT ...
2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
 sent hash and receievd reply 0 ms
2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
Status.WAIT ...
2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
 sent hash and receievd reply 0 ms
2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
Status.WAIT ...
2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
 sent hash and receievd reply 0 ms
2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
Status.WAIT ...
2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
 sent hash and receievd reply 0 ms
2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
Status.WAIT ...
{code}

Additional debug/patch statements revealed that InMemoryMerge is not invoked 
appropriately and not releasing the memory back for fetchers to proceed. e.g 
debug/patch messages are given below

{code}

syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
[fetcher [Map_1] #2] orderedgrouped.MergeManager: Patch..usedMemory=1551867234, 
memoryLimit=1073741824, commitMemory=883028388, mergeThreshold=708669632  === 
InMemoryMerge would be started in this case as commitMemory = mergeThreshold

syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
[fetcher [Map_1] #2] orderedgrouped.MergeManager: Patch..usedMemory=1273349784, 
memoryLimit=1073741824, commitMemory=347296632, mergeThreshold=708669632 === 
InMemoryMerge would *NOT* be started in this case as commitMemory  
mergeThreshold.  But the usedMemory is higher than memoryLimit.  Fetchers would 
keep waiting indefinitely until memory is released. InMemoryMerge will not kick 
in and not release memory.

syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:53,163 INFO 
[fetcher [Map_1] #1] orderedgrouped.MergeManager: Patch..usedMemory=1191994052, 
memoryLimit=1073741824, commitMemory=523155206, mergeThreshold=708669632 === 
InMemoryMerge would *NOT* be started in this case as commitMemory  
mergeThreshold.  But the usedMemory is higher than memoryLimit.  Fetchers would 
keep waiting indefinitely until memory is released.  InMemoryMerge will not 
kick in and not release memory.
{code}

In MergeManager, in memory merging is invoked under the following condition
{code}
if (!inMemoryMerger.isInProgress()  commitMemory = mergeThreshold)
{code}


Attaching the sample hive command just for reference
{code}
$HIVE_HOME/bin/hive -hiveconf tez.runtime.io.sort.factor=200 --hiveconf 
hive.tez.auto.reducer.parallelism=false --hiveconf 
tez.am.heartbeat.interval-ms.max=20 --hiveconf tez.runtime.io.sort.mb=1200 
--hiveconf tez.runtime.sort.threads=2 --hiveconf hive.tez.container.size=4096 
--hiveconf tez.runtime.shuffle.memory-to-memory.enable=true --hiveconf 
tez.runtime.shuffle.memory-to-memory.segments=4


[jira] [Updated] (TEZ-1923) FetcherOrderedGrouped can get into infinite loop due to memory pressure

2015-01-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1923:
--
Attachment: TEZ-1923.1.patch

Tried with the same hive job and it works fine without any infinite loop 
issues. 

[~sseth] - Can you please review when you have time?

 FetcherOrderedGrouped can get into infinite loop due to memory pressure
 ---

 Key: TEZ-1923
 URL: https://issues.apache.org/jira/browse/TEZ-1923
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
 Attachments: TEZ-1923.1.patch


 - Ran a comparatively large job (temp table creation) at 10 TB scale.
 - Turned on intermediate mem-to-mem 
 (tez.runtime.shuffle.memory-to-memory.enable=true and 
 tez.runtime.shuffle.memory-to-memory.segments=4)
 - Some reducers get lots of data and quickly gets into infinite loop
 {code}
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 {code}
 Additional debug/patch statements revealed that InMemoryMerge is not invoked 
 appropriately and not releasing the memory back for fetchers to proceed. e.g 
 debug/patch messages are given below
 {code}
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1551867234, memoryLimit=1073741824, commitMemory=883028388, 
 mergeThreshold=708669632  === InMemoryMerge would be started in this case 
 as commitMemory = mergeThreshold
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1273349784, memoryLimit=1073741824, commitMemory=347296632, 
 mergeThreshold=708669632 === InMemoryMerge would *NOT* be started in this 
 case as commitMemory  mergeThreshold.  But the usedMemory is higher than 
 memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
 released. InMemoryMerge will not kick in and not release memory.
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:53,163 INFO 
 [fetcher [Map_1] #1] orderedgrouped.MergeManager: 
 Patch..usedMemory=1191994052, memoryLimit=1073741824, commitMemory=523155206, 
 mergeThreshold=708669632 === InMemoryMerge would *NOT* be started in this 
 case as commitMemory  mergeThreshold.  But the usedMemory is higher than 
 memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
 released.  InMemoryMerge will not kick in and not release memory.
 {code}
 In MergeManager, in memory merging is invoked under the following condition
 {code}
 if (!inMemoryMerger.isInProgress()  commitMemory = mergeThreshold)
 {code}
 Attaching the 

[jira] [Updated] (TEZ-1925) Remove npm WARN messages from the Tez UI build process.

2015-01-07 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-1925:
-
Priority: Critical  (was: Major)

 Remove npm WARN messages from the Tez UI build process.
 ---

 Key: TEZ-1925
 URL: https://issues.apache.org/jira/browse/TEZ-1925
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Critical
 Attachments: TEZ-1925-v1.patch


 The Tez UI currently has these npm WARN messages.
 [INFO] npm WARN package.json tez-ui@0.0.1 No description
 [INFO] npm WARN package.json tez-ui@0.0.1 No repository field.
 [INFO] npm WARN package.json tez-ui@0.0.1 No README data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1928) Tez local mode hang in Pig tez local mode

2015-01-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated TEZ-1928:

Attachment: TestMultiQuery.log

 Tez local mode hang in Pig tez local mode
 -

 Key: TEZ-1928
 URL: https://issues.apache.org/jira/browse/TEZ-1928
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
 Attachments: TestMultiQuery.log, TestScalarAliasesLocal.log


 Pig tez local mode tests hang under some scenario. I attached several stack 
 trace of hanging tests.
 By setting tez.am.inline.task.execution.max-tasks, the test does not hang. 
 However, we cannot make it general since Pig backend code is not designed to 
 be multithread-safe. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-1912) Merge exceptions are thrown when enabling tez.runtime.shuffle.memory-to-memory.enable tez.runtime.shuffle.memory-to-memory.segments

2015-01-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved TEZ-1912.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Thanks [~sseth]. Committed to master.

commit f1f87c1c81c29e1a1be69dc3261a28cd7151f2b9


 Merge exceptions are thrown when enabling 
 tez.runtime.shuffle.memory-to-memory.enable  
 tez.runtime.shuffle.memory-to-memory.segments
 --

 Key: TEZ-1912
 URL: https://issues.apache.org/jira/browse/TEZ-1912
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
 Attachments: TEZ-1912.1.patch


 Merge exceptions are thrown when running a hive query on tez with the 
 following setting.  It works fine without mem-to-mem merge setting.
 {code}
 2015-01-04 20:04:01,371 ERROR [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.Shuffle: ShuffleRunner failed with error
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
  Error while doing final merge
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:364)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:327)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Rec# 22630125: Negative value-length: -1
 at 
 org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.positionToNextRecord(IFile.java:720)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.readRawKey(InMemoryReader.java:104)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$Segment.readRawKey(TezMerger.java:329)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.adjustPriorityQueue(TezMerger.java:500)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.next(TezMerger.java:545)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger.writeFile(TezMerger.java:204)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:862)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:473)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:362)
 ... 5 more
 {code}
 {code}
 $HIVE_HOME/bin/hive -hiveconf tez.runtime.io.sort.factor=200 --hiveconf 
 tez.shuffle-vertex-manager.min-src-fraction=1.0 --hiveconf 
 tez.shuffle-vertex-manager.max-src-fraction=1.0 --hiveconf 
 hive.tez.auto.reducer.parallelism=false --hiveconf 
 tez.am.heartbeat.interval-ms.max=20 --hiveconf tez.runtime.io.sort.mb=1200 
 --hiveconf tez.runtime.sort.threads=2 --hiveconf 
 tez.history.logging.service.class=org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService
  --hiveconf hive.tez.container.size=4096 --hiveconf 
 tez.runtime.shuffle.memory-to-memory.enable=true --hiveconf 
 tez.runtime.shuffle.memory-to-memory.segments=4
 --10 TB dataset
 use tpcds4_bin_partitioned_orc_1;
 drop table testData;
 create table testData as select 
 ss_sold_date_sk,ss_sold_time_sk,ss_item_sk,ss_customer_sk,ss_quantity,ss_sold_date
  from store_sales distribute by ss_sold_date;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1928) Tez local mode hang in Pig tez local mode

2015-01-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated TEZ-1928:

Attachment: TestMultiQueryBasic.log

 Tez local mode hang in Pig tez local mode
 -

 Key: TEZ-1928
 URL: https://issues.apache.org/jira/browse/TEZ-1928
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
 Attachments: TestMultiQuery.log, TestMultiQueryBasic.log, 
 TestScalarAliasesLocal.log


 Pig tez local mode tests hang under some scenario. I attached several stack 
 trace of hanging tests.
 By setting tez.am.inline.task.execution.max-tasks, the test does not hang. 
 However, we cannot make it general since Pig backend code is not designed to 
 be multithread-safe. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1928) Tez local mode hang in Pig tez local mode

2015-01-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268920#comment-14268920
 ] 

Siddharth Seth commented on TEZ-1928:
-

[~daijy] - if you're running with Tez-0.5.3 or higher, can you try setting the 
following - either programmatically or via tez-site 

{code}
  property
nametez.am.dag.scheduler.class/name

valueorg.apache.tez.dag.app.dag.impl.DAGSchedulerNaturalOrderControlled/value
  /property
{code}


 Tez local mode hang in Pig tez local mode
 -

 Key: TEZ-1928
 URL: https://issues.apache.org/jira/browse/TEZ-1928
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
 Attachments: TestMultiQuery.log, TestMultiQueryBasic.log, 
 TestScalarAliasesLocal.log


 Pig tez local mode tests hang under some scenario. I attached several stack 
 trace of hanging tests.
 By setting tez.am.inline.task.execution.max-tasks, the test does not hang. 
 However, we cannot make it general since Pig backend code is not designed to 
 be multithread-safe. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1904) Fix findbugs warnings in tez-runtime-library

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1904:

Attachment: TEZ-1904.1.txt

[~hitesh], [~rajesh.balamohan] - please review.

 Fix findbugs warnings in tez-runtime-library
 

 Key: TEZ-1904
 URL: https://issues.apache.org/jira/browse/TEZ-1904
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Siddharth Seth
 Attachments: TEZ-1904.1.txt


 https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1904) Fix findbugs warnings in tez-runtime-library

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned TEZ-1904:
---

Assignee: Siddharth Seth

 Fix findbugs warnings in tez-runtime-library
 

 Key: TEZ-1904
 URL: https://issues.apache.org/jira/browse/TEZ-1904
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Siddharth Seth
 Attachments: TEZ-1904.1.txt


 https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1925) Remove npm WARN messages from the Tez UI build process.

2015-01-07 Thread Prakash Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268798#comment-14268798
 ] 

Prakash Ramachandran commented on TEZ-1925:
---

+1 lgtm

 Remove npm WARN messages from the Tez UI build process.
 ---

 Key: TEZ-1925
 URL: https://issues.apache.org/jira/browse/TEZ-1925
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Critical
 Attachments: TEZ-1925-v1.patch


 The Tez UI currently has these npm WARN messages.
 [INFO] npm WARN package.json tez-ui@0.0.1 No description
 [INFO] npm WARN package.json tez-ui@0.0.1 No repository field.
 [INFO] npm WARN package.json tez-ui@0.0.1 No README data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1923) FetcherOrderedGrouped gets into infinite loop due to memory pressure

2015-01-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1923:
--
Attachment: TEZ-1923.2.patch


Rajesh, I believe this will be triggered If there's parallel chunks being 
fetched without enough completing to hit the current merge condition ?


Yes, this is very easily reproduced with mem-to-mem merging (even though such 
tight loops are possible without mem-to-mem merging).

Agreed that the initial patch can end up triggering more spills to disk.  
Uploading refined patch to address the following
1. Fetchers would wait instead of getting into tight loop.
2. IntermediateMemoryToMemoryMerger would start merging only when there is 
enough memory available.
3. When mem-to-mem merging is enabled, it would additionally check for 
(usedMemory  memoryLimit).  If so, it would kick off mem-to-disk merging to 
release the memory pressure and to avoid fetchers indefinitely waiting.

[~hitesh] - I can backport to 0.5.4 after review.

 FetcherOrderedGrouped gets into infinite loop due to memory pressure
 

 Key: TEZ-1923
 URL: https://issues.apache.org/jira/browse/TEZ-1923
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-1923.1.patch, TEZ-1923.2.patch


 - Ran a comparatively large job (temp table creation) at 10 TB scale.
 - Turned on intermediate mem-to-mem 
 (tez.runtime.shuffle.memory-to-memory.enable=true and 
 tez.runtime.shuffle.memory-to-memory.segments=4)
 - Some reducers get lots of data and quickly gets into infinite loop
 {code}
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 {code}
 Additional debug/patch statements revealed that InMemoryMerge is not invoked 
 appropriately and not releasing the memory back for fetchers to proceed. e.g 
 debug/patch messages are given below
 {code}
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1551867234, memoryLimit=1073741824, commitMemory=883028388, 
 mergeThreshold=708669632  === InMemoryMerge would be started in this case 
 as commitMemory = mergeThreshold
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1273349784, memoryLimit=1073741824, commitMemory=347296632, 
 mergeThreshold=708669632 === InMemoryMerge would *NOT* be started in this 
 case as commitMemory  mergeThreshold.  But the usedMemory is higher than 
 memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
 

[jira] [Assigned] (TEZ-1274) Remove Key/Value type checks in IFile

2015-01-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned TEZ-1274:
-

Assignee: Rajesh Balamohan

 Remove Key/Value type checks in IFile
 -

 Key: TEZ-1274
 URL: https://issues.apache.org/jira/browse/TEZ-1274
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Rajesh Balamohan

 We check key and value types for each record - this should be removed from 
 the tight loop. Maybe an assertion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1928) Tez local mode hang in Pig tez local mode

2015-01-07 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated TEZ-1928:

Attachment: TestScalarAliasesLocal.log

 Tez local mode hang in Pig tez local mode
 -

 Key: TEZ-1928
 URL: https://issues.apache.org/jira/browse/TEZ-1928
 Project: Apache Tez
  Issue Type: Bug
Reporter: Daniel Dai
 Attachments: TestScalarAliasesLocal.log


 Pig tez local mode tests hang under some scenario. I attached several stack 
 trace of hanging tests.
 By setting tez.am.inline.task.execution.max-tasks, the test does not hang. 
 However, we cannot make it general since Pig backend code is not designed to 
 be multithread-safe. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-669) [Umbrella] Security in Tez

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-669.

Resolution: Fixed

 [Umbrella] Security in Tez
 --

 Key: TEZ-669
 URL: https://issues.apache.org/jira/browse/TEZ-669
 Project: Apache Tez
  Issue Type: Task
Reporter: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-785) Vertex.checkVertexForCompletion needs to handle additional VertexTerminationCauses

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-785.

Resolution: Done

I believe this is fixed as part of diagnostic improvements done elsewhere.

 Vertex.checkVertexForCompletion needs to handle additional 
 VertexTerminationCauses
 --

 Key: TEZ-785
 URL: https://issues.apache.org/jira/browse/TEZ-785
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-912) Change ScatterGatherShuffle and BroadcastShuffle to use the same code path

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-912:
---
Target Version/s: 0.7.0

 Change ScatterGatherShuffle and BroadcastShuffle to use the same code path
 --

 Key: TEZ-912
 URL: https://issues.apache.org/jira/browse/TEZ-912
 Project: Apache Tez
  Issue Type: Task
Reporter: Siddharth Seth

 Currently there's 2 shuffle schedulers, 2 fetchers, etc. Maintenance 
 headache. Merging the two together is a decent amount of work though - 
 considering how Merge, Shuffle and Fetch are tied together in case of 
 ShuffledMergedInput.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-924) InputFailedEvent handling for Shuffle

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-924:
---
Target Version/s: 0.7.0

 InputFailedEvent handling for Shuffle
 -

 Key: TEZ-924
 URL: https://issues.apache.org/jira/browse/TEZ-924
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth
Priority: Critical

 Shuffle receives batches of Events to process from the AM. The way these 
 events are sent over to the ShuffleHandlers and the way they're processed - 
 it's possible that Shuffle will start fetching data from an Event, which is 
 to be subsequently marked as failed (via an InputFailedEvent)
 1) The AM sends events in batches. An InputFailedEvent for a specific Input 
 may not be part of the same batch which contained the original event which is 
 being marked bad.
 2) The ShuffleEventHandler processes the events in each batch one event at a 
 time - so even if the InputFailedEvent follows - it's possible for Shuffle to 
 start fetching data from a Failed Input.
 The AM needs to change to invalidate Inputs up front - so that related events 
 don't span batches. Alternately, it needs to apply the InputFailedEvent to 
 the original event being sent.
 The Shuffle itself should process a batch update as a batch - that would 
 prevent fetchers from starting early even though there may be additional 
 events for the same host.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-941) Avoid writing out empty partitions in Sorter implementations

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-941:
---
Target Version/s: 0.7.0
  Labels:   (was: 0.4)

 Avoid writing out empty partitions in Sorter implementations
 

 Key: TEZ-941
 URL: https://issues.apache.org/jira/browse/TEZ-941
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1529) ATS and TezClient integration in secure kerberos enabled cluster

2015-01-07 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268470#comment-14268470
 ] 

Jonathan Eagles commented on TEZ-1529:
--

[~pramachandran], I don't have much context on this issue, but can this issue 
be re-targeted for Tez 0.6.1 or 0.7.0 release?

 ATS and TezClient integration  in secure kerberos enabled cluster
 -

 Key: TEZ-1529
 URL: https://issues.apache.org/jira/browse/TEZ-1529
 Project: Apache Tez
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Prakash Ramachandran
Priority: Blocker

 This is a follow up for TEZ-1495 which address ATS - TezClient integration. 
 however it does not enable it  in secure kerberos enabled cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1095) Enable tez.runtime.shuffle.memory-to-memory.enable for mem-to-mem merging

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1095:

Target Version/s: 0.7.0

 Enable tez.runtime.shuffle.memory-to-memory.enable for mem-to-mem merging
 -

 Key: TEZ-1095
 URL: https://issues.apache.org/jira/browse/TEZ-1095
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan

 Currently tez.runtime.shuffle.memory-to-memory.enable is set to false by 
 default.  We need to evaluate the usefulness of this parameter and enable it 
 by default if it provides good perf boosts.  
 There is also a possibility that waitForInMemoryMerge() will return in sub 
 milliseconds when this parameter is enabled, causing pressure on network 
 resources.  Related JIRA: https://issues.apache.org/jira/browse/TEZ-1091



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1094) Support pipelined data transfer for Unordered Output

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1094:

Target Version/s: 0.7.0

 Support pipelined data transfer for Unordered Output
 

 Key: TEZ-1094
 URL: https://issues.apache.org/jira/browse/TEZ-1094
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth

 For unsorted output (and possibly for sorted output), it should be possible 
 to send data in small batches instead of waiting for everything to be 
 generated before transmitting. For now, planning on getting started with 
 UnsortedOutput / Input pairs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1212) Remove synchronization on the write method in OnFileSortedOutput

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1212:

Target Version/s: 0.7.0

 Remove synchronization on the write method in OnFileSortedOutput
 

 Key: TEZ-1212
 URL: https://issues.apache.org/jira/browse/TEZ-1212
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1211) Remove synchronization on the write method in OnFileUnorderedPartitionedOutput

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1211:

Target Version/s: 0.7.0

 Remove synchronization on the write method in OnFileUnorderedPartitionedOutput
 --

 Key: TEZ-1211
 URL: https://issues.apache.org/jira/browse/TEZ-1211
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1274) Remove Key/Value type checks in IFile

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1274:

Target Version/s: 0.7.0

 Remove Key/Value type checks in IFile
 -

 Key: TEZ-1274
 URL: https://issues.apache.org/jira/browse/TEZ-1274
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth

 We check key and value types for each record - this should be removed from 
 the tight loop. Maybe an assertion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1491) Tez reducer-side merge's counter update is slow

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1491:

Target Version/s: 0.7.0

 Tez reducer-side merge's counter update is slow
 ---

 Key: TEZ-1491
 URL: https://issues.apache.org/jira/browse/TEZ-1491
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: perf-top-counters.png


 TezMerger$MergeQueue::next() shows up in profiles due a synchronized block in 
 a tight loop.
 Part of the slow operation was due to DataInputBuffer issues identified 
 earlier in HADOOP-10694, but along with that approx 11% of my lock prefix 
 calls were originating from the following line.
 {code}
   mergeProgress.set(totalBytesProcessed * progPerByte);
 {code}
 in two places within the core loop.
 !perf-top-counters.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1526) LoadingCache for TezTaskID slow for large jobs

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1526:

Target Version/s: 0.6.0, 0.7.0  (was: 0.6.0)

 LoadingCache for TezTaskID slow for large jobs
 --

 Key: TEZ-1526
 URL: https://issues.apache.org/jira/browse/TEZ-1526
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
  Labels: performance
 Attachments: 10-TezTaskIDs.patch, TEZ-1526-v1.patch, 
 TEZ-1526-v2.patch


 Using the LoadingCache with default builder settings. 100,000 TezTaskIDs are 
 created in 10 seconds on my setup. With a LoadingCache initialCapacity of 
 10,000 they are created in 300 ms. With no LoadingCache, they are created in 
 10 ms. A test case in attached to illustrate the condition I would like to be 
 sped up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1573) Exception from InputInitializer and VertexManagerPlugin is not propogated to client

2015-01-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268515#comment-14268515
 ] 

Siddharth Seth commented on TEZ-1573:
-

[~jeffzhang] - is this fixed as part of 1267 and the releated jiras ?

 Exception from InputInitializer and VertexManagerPlugin is not propogated to 
 client
 ---

 Key: TEZ-1573
 URL: https://issues.apache.org/jira/browse/TEZ-1573
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Jeff Zhang
Assignee: Jeff Zhang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-1921) Intermediate data cleanup for long running sessions

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-1921.
-
Resolution: Duplicate

 Intermediate data cleanup for long running sessions
 ---

 Key: TEZ-1921
 URL: https://issues.apache.org/jira/browse/TEZ-1921
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Bikas Saha

 Intermediate data for a DAG could be deleted after a DAG has completed. Else 
 it accumulates until the session completes and could unnecessarily fill up 
 the local disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-776) Reduce AM mem usage caused by storing TezEvents

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-776:
---
Target Version/s: 0.7.0

 Reduce AM mem usage caused by storing TezEvents
 ---

 Key: TEZ-776
 URL: https://issues.apache.org/jira/browse/TEZ-776
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth

 This is open ended at the moment.
 A fair chunk of the AM heap is taken up by TezEvents (specifically 
 DataMovementEvents - 64 bytes per event).
 Depending on the connection pattern - this puts limits on the number of tasks 
 that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-705) Add a helper which can be used to obtain credentials, setup MRInput parameters

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-705.

Resolution: Won't Fix

MRInput / MROutput modified to make this simpler.

 Add a helper which can be used to obtain credentials, setup MRInput parameters
 --

 Key: TEZ-705
 URL: https://issues.apache.org/jira/browse/TEZ-705
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-690) Tez API Ease of Use

2015-01-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268452#comment-14268452
 ] 

Siddharth Seth commented on TEZ-690:


Can this be closed ?

 Tez API Ease of Use
 ---

 Key: TEZ-690
 URL: https://issues.apache.org/jira/browse/TEZ-690
 Project: Apache Tez
  Issue Type: Task
Reporter: Bikas Saha
Assignee: Bikas Saha

 Recently we wrote the wordcount example from scratch using Tez API's in 
 TEZ-689. The code shows some room for improvement in making the Tez API's 
 more concise and less error prone. This jira tracks some of those changes. 
 The improvements in this jira will be reflected in the cleanliness and 
 conciseness of the word count example job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-770) Remove SessionLocalResources

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-770.

Resolution: Done

Resolved elsewhere.

 Remove SessionLocalResources
 

 Key: TEZ-770
 URL: https://issues.apache.org/jira/browse/TEZ-770
 Project: Apache Tez
  Issue Type: Task
Reporter: Siddharth Seth
Assignee: Siddharth Seth

 These are currently not used, or exposed to users. For now - they just end up 
 adding additional steps when running on a secure cluster. Can be 
 re-introduced if we need them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-946) Tez loses buffer-cache performance by running interleaved vertexes

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-946:
---
Target Version/s: 0.7.0

 Tez loses buffer-cache performance by running interleaved vertexes
 --

 Key: TEZ-946
 URL: https://issues.apache.org/jira/browse/TEZ-946
 Project: Apache Tez
  Issue Type: Bug
Reporter: Gopal V
 Attachments: union-10.svg


 For a task which has multiple reduce vertexes running to generate UNION ops, 
 the current Tez behaviour causes bad cache performance as well as bad perf 
 with the object registry.
 The map spill files get paged in and out of cache, when I was running a large 
 query which had multiple reducers pulling data off different shuffle edges at 
 the same time.
 Along with this, whenever a map-join vertex is interleaved with a reducer 
 vertex, the map-join hashtable gets dropped in the transition.
 It would be beneficial to schedule the vertexes at the same level with some 
 priority so that we finish them faster through better buffer-cache hit-rate 
 and object-registry hit-rate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-943) Potential memory leaks caused by holding on ot TaskAttemptIDs and Containers

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-943:
---
Target Version/s: 0.7.0

 Potential memory leaks caused by holding on ot TaskAttemptIDs and Containers
 

 Key: TEZ-943
 URL: https://issues.apache.org/jira/browse/TEZ-943
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth

 Details at 
 https://issues.apache.org/jira/browse/TEZ-940?focusedCommentId=13938870page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13938870



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-485) Get rid of TezTaskStatus

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-485:
---
Target Version/s: 0.7.0

 Get rid of TezTaskStatus
 

 Key: TEZ-485
 URL: https://issues.apache.org/jira/browse/TEZ-485
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Priority: Minor
 Attachments: TEZ-485.1.txt


 TezTaskStatus is used by the MR Reporter only. We should be able to get rid 
 of this interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-485) Get rid of TezTaskStatus

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned TEZ-485:
--

Assignee: Siddharth Seth

 Get rid of TezTaskStatus
 

 Key: TEZ-485
 URL: https://issues.apache.org/jira/browse/TEZ-485
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Minor
 Attachments: TEZ-485.1.txt


 TezTaskStatus is used by the MR Reporter only. We should be able to get rid 
 of this interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-965) Tez needs a circuit-breaker to avoid mistaking network blips to task/node failures

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-965:
---
Target Version/s: 0.7.0

 Tez needs a circuit-breaker to avoid mistaking network blips to task/node 
 failures
 

 Key: TEZ-965
 URL: https://issues.apache.org/jira/browse/TEZ-965
 Project: Apache Tez
  Issue Type: Bug
 Environment: Flaky DNS cluster
Reporter: Gopal V

 If DNS resolution fails for a period of 5-10 seconds, Tez restarts  
 contra-flows in the query triggering recovery of nearly everything it has run.
 Nodes are getting marked as bad because they can't shuffle (dns resolution 
 failed for all NMs), which results in log lines like 
 {code}
 attempt_1394928384313_0234_1_25_000654_0 blamed for read error from 
 attempt_1394928384313_0234_1_24_000366_0 
 {code}
 And the tasks restart from an earlier vertex.
 When a large number of such failures happen, the tasks shouldn't restart 
 previous vertexes, but instead should flip a circuit  back-off till the 
 network blip disappears.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-967) Expose list of running tasks along with meta information

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-967:
---
Target Version/s: 0.7.0

 Expose list of running tasks along with meta information
 

 Key: TEZ-967
 URL: https://issues.apache.org/jira/browse/TEZ-967
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth

 Useful to figure out what is running while executing a DAG - especially while 
 debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1078) ValuesIterator does not need to deserialize keys for comparison

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1078:

Target Version/s: 0.7.0

 ValuesIterator does not need to deserialize keys for comparison
 ---

 Key: TEZ-1078
 URL: https://issues.apache.org/jira/browse/TEZ-1078
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth

 ValuesIterator - which provides a Key, Values view - ends up deserializing 
 each key before comparing it to the previous key when trying to determine 
 whether a new key has been found or the next K-V pair in the IFile belongs to 
 the same key.
 It should be possible to use the compare(byte[]. ...) method from the 
 RawComparator interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-485) Get rid of TezTaskStatus

2015-01-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268487#comment-14268487
 ] 

Hitesh Shah commented on TEZ-485:
-

+1

 Get rid of TezTaskStatus
 

 Key: TEZ-485
 URL: https://issues.apache.org/jira/browse/TEZ-485
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Minor
 Attachments: TEZ-485.1.txt


 TezTaskStatus is used by the MR Reporter only. We should be able to get rid 
 of this interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1275) Add an append method to IFile which does not check for RLE

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1275:

Target Version/s: 0.7.0

 Add an append method to IFile which does not check for RLE
 --

 Key: TEZ-1275
 URL: https://issues.apache.org/jira/browse/TEZ-1275
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth

 The RLE and same key checks are primarily required for sorted output.
 For the unordered case - these checks should not be hit (and will almost 
 always return false).
 I believe longer term, the plan is to have only a single method - which does 
 not have the checks, and move all the key comparison and equality logic over 
 to users of IFile - which would end up calling appendKV on new keys, and 
 append(V/ListV) for repeated values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1363) Make use of the regular scheduler when running in LocalMode

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1363:

Target Version/s: 0.7.0

 Make use of the regular scheduler when running in LocalMode
 ---

 Key: TEZ-1363
 URL: https://issues.apache.org/jira/browse/TEZ-1363
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Jonathan Eagles
 Attachments: TEZ-1363-v1.patch, TEZ-1363-v2.patch, TEZ-1363-v3.patch


 In TEZ-708, we decided to introduce a new scheduler for local mode - to keep 
 things simple initially, and get local mode working.
 Eventually, however, scheduling should go through the regular task scheduler 
 - which should be able to get containers from YARN / LocalAllocator / other 
 sources - and treat them as a regular container for scheduling purposes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1518) Clean up ID caches on DAG completion

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1518:

Target Version/s: 0.6.0, 0.7.0  (was: 0.6.0)

 Clean up ID caches on DAG completion
 

 Key: TEZ-1518
 URL: https://issues.apache.org/jira/browse/TEZ-1518
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-1546) Change InputInitializerContext.registerForVertexStateUpdates to return a list of pending state changes

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-1546.
-
Resolution: Won't Fix

 Change InputInitializerContext.registerForVertexStateUpdates to return a list 
 of pending state changes
 --

 Key: TEZ-1546
 URL: https://issues.apache.org/jira/browse/TEZ-1546
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Critical

 Sending pending events via the stateChange on the InputInitializer can be 
 confusing - since multiple calls will be made back to back, without knowing 
 how many events are coming in , and which the last one is.
 Returning all past state changes via register ensures invocations of 
 onStateChanged are current events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-485) Get rid of TezTaskStatus

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-485.

   Resolution: Fixed
Fix Version/s: 0.7.0

Committed to master.

 Get rid of TezTaskStatus
 

 Key: TEZ-485
 URL: https://issues.apache.org/jira/browse/TEZ-485
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Minor
 Fix For: 0.7.0

 Attachments: TEZ-485.1.txt


 TezTaskStatus is used by the MR Reporter only. We should be able to get rid 
 of this interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1910) Build fails against hadoop-2.2.0

2015-01-07 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1910:
-
Attachment: TEZ-1910.2.patch

Modified patch to add relevant comments.

 Build fails against hadoop-2.2.0
 

 Key: TEZ-1910
 URL: https://issues.apache.org/jira/browse/TEZ-1910
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: TEZ-1910.1.patch, TEZ-1910.2.patch


 https://builds.apache.org/job/Tez-Build-Hadoop-2.2/2/console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1910) Build fails against hadoop-2.2.0

2015-01-07 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268549#comment-14268549
 ] 

Jonathan Eagles commented on TEZ-1910:
--

+1. Will commit shortly. Thanks, [~hitesh].

 Build fails against hadoop-2.2.0
 

 Key: TEZ-1910
 URL: https://issues.apache.org/jira/browse/TEZ-1910
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
Priority: Blocker
 Attachments: TEZ-1910.1.patch, TEZ-1910.2.patch


 https://builds.apache.org/job/Tez-Build-Hadoop-2.2/2/console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1925) Remove npm WARN messages from the Tez UI build process.

2015-01-07 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created TEZ-1925:


 Summary: Remove npm WARN messages from the Tez UI build process.
 Key: TEZ-1925
 URL: https://issues.apache.org/jira/browse/TEZ-1925
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1925) Remove npm WARN messages from the Tez UI build process.

2015-01-07 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-1925:
-
Attachment: TEZ-1925-v1.patch

[~pramachandran], [~hitesh], can you have a review?

 Remove npm WARN messages from the Tez UI build process.
 ---

 Key: TEZ-1925
 URL: https://issues.apache.org/jira/browse/TEZ-1925
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: TEZ-1925-v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1925) Remove npm WARN messages from the Tez UI build process.

2015-01-07 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-1925:
-
Description: 
The Tez UI currently has these npm WARN messages.
[INFO] npm WARN package.json tez-ui@0.0.1 No description
[INFO] npm WARN package.json tez-ui@0.0.1 No repository field.
[INFO] npm WARN package.json tez-ui@0.0.1 No README data


 Remove npm WARN messages from the Tez UI build process.
 ---

 Key: TEZ-1925
 URL: https://issues.apache.org/jira/browse/TEZ-1925
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: TEZ-1925-v1.patch


 The Tez UI currently has these npm WARN messages.
 [INFO] npm WARN package.json tez-ui@0.0.1 No description
 [INFO] npm WARN package.json tez-ui@0.0.1 No repository field.
 [INFO] npm WARN package.json tez-ui@0.0.1 No README data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1900) Fix findbugs warnings in tez-dag

2015-01-07 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned TEZ-1900:


Assignee: Hitesh Shah

 Fix findbugs warnings in tez-dag
 

 Key: TEZ-1900
 URL: https://issues.apache.org/jira/browse/TEZ-1900
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Hitesh Shah

 Might need to be split out more. 
 https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-dag.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1926) fatalError reported in LogicalIOProcessorRuntimeTask isn't reported to AM

2015-01-07 Thread Siddharth Seth (JIRA)
Siddharth Seth created TEZ-1926:
---

 Summary: fatalError reported in LogicalIOProcessorRuntimeTask 
isn't reported to AM
 Key: TEZ-1926
 URL: https://issues.apache.org/jira/browse/TEZ-1926
 Project: Apache Tez
  Issue Type: Bug
Reporter: Siddharth Seth


May not need to be - but needs to be looked at.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1923) FetcherOrderedGrouped gets into infinite loop due to memory pressure

2015-01-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268186#comment-14268186
 ] 

Siddharth Seth commented on TEZ-1923:
-

Scratch that. This is linked to the MemoryToDisk merger and happens when data 
comes in parallel and takes time to complete - large chunks for example.  
commitMemory is only accounted for after a fetch completes, usedMemory is 
accounted on each reserve. Rajesh, I believe this will be triggered If there's 
parallel chunks being fetched without enough completing to hit the current 
merge condition ?

Even with this patch, I think it's possible for the WAIT loop in the following 
situations. 1) closeInMemoryFile is not invoked - i.e. all fetches are still in 
progress, and the merge doesn't get triggered - fetchers would end up in the 
WAIT loop. 2) If a single merge were to complete and trigger the new condition 
- the merge essentially writes a single segment, clears up some memory and go 
back to condition 1.

Also, this would imply writing out more files to disk than we would want to - 
since the merger will be triggered more often.

One option would be to check and wait on usedMemory in the fetchers - instead 
of just relying on the merger to be running.

 FetcherOrderedGrouped gets into infinite loop due to memory pressure
 

 Key: TEZ-1923
 URL: https://issues.apache.org/jira/browse/TEZ-1923
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-1923.1.patch


 - Ran a comparatively large job (temp table creation) at 10 TB scale.
 - Turned on intermediate mem-to-mem 
 (tez.runtime.shuffle.memory-to-memory.enable=true and 
 tez.runtime.shuffle.memory-to-memory.segments=4)
 - Some reducers get lots of data and quickly gets into infinite loop
 {code}
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 {code}
 Additional debug/patch statements revealed that InMemoryMerge is not invoked 
 appropriately and not releasing the memory back for fetchers to proceed. e.g 
 debug/patch messages are given below
 {code}
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1551867234, memoryLimit=1073741824, commitMemory=883028388, 
 mergeThreshold=708669632  === InMemoryMerge would be started in this case 
 as commitMemory = mergeThreshold
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1273349784, memoryLimit=1073741824, 

[jira] [Comment Edited] (TEZ-1915) Add public key to KEYS

2015-01-07 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268172#comment-14268172
 ] 

Jonathan Eagles edited comment on TEZ-1915 at 1/7/15 8:42 PM:
--

Simple release prep fix for 0.6. Committed to branch-0.6 and master.

Also, published this key to http://pgp.mit.edu/


was (Author: jeagles):
Simple release prep fix for 0.6. Committed to branch-0.6 and master

 Add public key to KEYS
 --

 Key: TEZ-1915
 URL: https://issues.apache.org/jira/browse/TEZ-1915
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Blocker
 Fix For: 0.6.0

 Attachments: TEZ-1915-v1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1923) FetcherOrderedGrouped gets into infinite loop due to memory pressure

2015-01-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-1923:
--
Summary: FetcherOrderedGrouped gets into infinite loop due to memory 
pressure  (was: FetcherOrderedGrouped can get into infinite loop due to memory 
pressure)

 FetcherOrderedGrouped gets into infinite loop due to memory pressure
 

 Key: TEZ-1923
 URL: https://issues.apache.org/jira/browse/TEZ-1923
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
 Attachments: TEZ-1923.1.patch


 - Ran a comparatively large job (temp table creation) at 10 TB scale.
 - Turned on intermediate mem-to-mem 
 (tez.runtime.shuffle.memory-to-memory.enable=true and 
 tez.runtime.shuffle.memory-to-memory.segments=4)
 - Some reducers get lots of data and quickly gets into infinite loop
 {code}
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 {code}
 Additional debug/patch statements revealed that InMemoryMerge is not invoked 
 appropriately and not releasing the memory back for fetchers to proceed. e.g 
 debug/patch messages are given below
 {code}
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1551867234, memoryLimit=1073741824, commitMemory=883028388, 
 mergeThreshold=708669632  === InMemoryMerge would be started in this case 
 as commitMemory = mergeThreshold
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1273349784, memoryLimit=1073741824, commitMemory=347296632, 
 mergeThreshold=708669632 === InMemoryMerge would *NOT* be started in this 
 case as commitMemory  mergeThreshold.  But the usedMemory is higher than 
 memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
 released. InMemoryMerge will not kick in and not release memory.
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:53,163 INFO 
 [fetcher [Map_1] #1] orderedgrouped.MergeManager: 
 Patch..usedMemory=1191994052, memoryLimit=1073741824, commitMemory=523155206, 
 mergeThreshold=708669632 === InMemoryMerge would *NOT* be started in this 
 case as commitMemory  mergeThreshold.  But the usedMemory is higher than 
 memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
 released.  InMemoryMerge will not kick in and not release memory.
 {code}
 In MergeManager, in memory merging is invoked under the following condition
 {code}
 if (!inMemoryMerger.isInProgress()  commitMemory = mergeThreshold)
 {code}
 Attaching the sample hive 

[jira] [Assigned] (TEZ-1923) FetcherOrderedGrouped gets into infinite loop due to memory pressure

2015-01-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned TEZ-1923:
-

Assignee: Rajesh Balamohan

 FetcherOrderedGrouped gets into infinite loop due to memory pressure
 

 Key: TEZ-1923
 URL: https://issues.apache.org/jira/browse/TEZ-1923
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-1923.1.patch


 - Ran a comparatively large job (temp table creation) at 10 TB scale.
 - Turned on intermediate mem-to-mem 
 (tez.runtime.shuffle.memory-to-memory.enable=true and 
 tez.runtime.shuffle.memory-to-memory.segments=4)
 - Some reducers get lots of data and quickly gets into infinite loop
 {code}
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 {code}
 Additional debug/patch statements revealed that InMemoryMerge is not invoked 
 appropriately and not releasing the memory back for fetchers to proceed. e.g 
 debug/patch messages are given below
 {code}
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1551867234, memoryLimit=1073741824, commitMemory=883028388, 
 mergeThreshold=708669632  === InMemoryMerge would be started in this case 
 as commitMemory = mergeThreshold
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1273349784, memoryLimit=1073741824, commitMemory=347296632, 
 mergeThreshold=708669632 === InMemoryMerge would *NOT* be started in this 
 case as commitMemory  mergeThreshold.  But the usedMemory is higher than 
 memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
 released. InMemoryMerge will not kick in and not release memory.
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:53,163 INFO 
 [fetcher [Map_1] #1] orderedgrouped.MergeManager: 
 Patch..usedMemory=1191994052, memoryLimit=1073741824, commitMemory=523155206, 
 mergeThreshold=708669632 === InMemoryMerge would *NOT* be started in this 
 case as commitMemory  mergeThreshold.  But the usedMemory is higher than 
 memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
 released.  InMemoryMerge will not kick in and not release memory.
 {code}
 In MergeManager, in memory merging is invoked under the following condition
 {code}
 if (!inMemoryMerger.isInProgress()  commitMemory = mergeThreshold)
 {code}
 Attaching the sample hive command just for reference
 {code}
 $HIVE_HOME/bin/hive -hiveconf 

[jira] [Commented] (TEZ-1913) Reduce deserialize cost in ValuesIterator

2015-01-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268331#comment-14268331
 ] 

Siddharth Seth commented on TEZ-1913:
-

Questions/comments on the patch.
- The ValuesIterator is used by the Combiners as well. I'm not sure a result of 
a merge (RawKViterator which support isSameKey) is the only iterator which will 
be used in these cases. From the PipelineSorter, there were some RawKVIterators 
which don't implement the method.
- EmptyIteartor.isSameKey isn't implemented - don't think we'll ever his this, 
but the Merger can return an instance of this. Should probably change this to 
return false.
- Test in TestValuesIterator to validate same keys working.

 Reduce deserialize cost in ValuesIterator
 -

 Key: TEZ-1913
 URL: https://issues.apache.org/jira/browse/TEZ-1913
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: perfomance
 Attachments: TEZ-1913.1.patch


 When TezRawKeyValueIterator-isSameKey() is added, it should be possible to 
 reduce the number of deserializations in ValuesIterator-readNextKey().
 Creating this ticket to track the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-15) Support for DAG AM recovery

2015-01-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268339#comment-14268339
 ] 

Siddharth Seth commented on TEZ-15:
---

Can this be closed, since recovery is already supported.

 Support for DAG AM recovery
 ---

 Key: TEZ-15
 URL: https://issues.apache.org/jira/browse/TEZ-15
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Bikas Saha





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-1924) Tez AM does not register with AM with full FQDN causing jobs to fail in some environments

2015-01-07 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah resolved TEZ-1924.
--
   Resolution: Fixed
Fix Version/s: 0.5.4

Committed to master, branch 0.5 and branch 0.6. Thanks for your contribution 
[~ivanmi]

 Tez AM does not register with AM with full FQDN causing jobs to fail in some 
 environments
 -

 Key: TEZ-1924
 URL: https://issues.apache.org/jira/browse/TEZ-1924
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Fix For: 0.5.4

 Attachments: TEZ-1924.2.patch, TEZ-20.patch


 Issue originally reported by [~Karam Singh].
 All OrderWordCount, WordCount and Tez tests faultTolerance system tests 
 failed due to java.net.UnknownHostException
 Interesting other tez examples such as mrrsleep, randomwriter, 
 randomtextwriter, sort, join_inner, join_outer, terasort, 
 groupbyorderbymrrtest ran fine
 one such example is following
 {code}
 RUNNING: /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount 
 -DUSE_TEZ_SESSION=true -Dmapreduce.map.memory.mb=2048 
 -Dtez.am.shuffle-vertex-manager.max-src-fraction=0 
 -Dmapreduce.reduce.memory.mb=2048 -Dmapreduce.framework.name=yarn-tez 
 -Dtez.am.container.reuse.enabled=false -Dtez.am.log.level=DEBUG 
 -Dmapreduce.map.java.opts=-Xmx1024m 
 -Dtez.am.shuffle-vertex-manager.min-src-fraction=0 
 -Dmapreduce.job.reduce.slowstart.completedmaps=0.01 
 -Dmapreduce.reduce.java.opts=-Xmx1024m 
 -Dtez.am.container.session.delay-allocation-millis=12 
 /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 
 /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 
 -generateSplitsInClient true
 14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from 
 hadoop-metrics2.properties
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
 60 second(s).
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics 
 system started
 14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging 
 directory 
 wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
  are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx--
 14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
 14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application 
 application_1418977790315_0016
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount 
 DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, 
 outputPath=/user/hrt_qa/Tez_CROutput_1
 14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, 
 splitsDir=wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
 14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 
 20
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to 
 get into ready state
 14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via 
 proxy
 org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: 
 java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
 destination host is: workernode1:59575; java.net.UnknownHostException; For 
 more details see:  http://wiki.apache.org/hadoop/UnknownHost
   at 
 org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at 

[jira] [Commented] (TEZ-1924) Tez AM does not register with AM with full FQDN causing jobs to fail in some environments

2015-01-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268280#comment-14268280
 ] 

Hitesh Shah commented on TEZ-1924:
--

Thanks for filing the issue [~ivanmi] and also for providing a patch. 

Some general comments:
  - It is usually better if the patch file is named the same as the jira ( with 
a version number for multiple iterations on the patch ).
  - With respect to using the NM hostname, would it be better to extract the 
FQDN from the server object itself if possible? 

 Tez AM does not register with AM with full FQDN causing jobs to fail in some 
 environments
 -

 Key: TEZ-1924
 URL: https://issues.apache.org/jira/browse/TEZ-1924
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Ivan Mitic
 Attachments: TEZ-20.patch


 Issue originally reported by [~Karam Singh].
 All OrderWordCount, WordCount and Tez tests faultTolerance system tests 
 failed due to java.net.UnknownHostException
 Interesting other tez examples such as mrrsleep, randomwriter, 
 randomtextwriter, sort, join_inner, join_outer, terasort, 
 groupbyorderbymrrtest ran fine
 one such example is following
 {code}
 RUNNING: /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount 
 -DUSE_TEZ_SESSION=true -Dmapreduce.map.memory.mb=2048 
 -Dtez.am.shuffle-vertex-manager.max-src-fraction=0 
 -Dmapreduce.reduce.memory.mb=2048 -Dmapreduce.framework.name=yarn-tez 
 -Dtez.am.container.reuse.enabled=false -Dtez.am.log.level=DEBUG 
 -Dmapreduce.map.java.opts=-Xmx1024m 
 -Dtez.am.shuffle-vertex-manager.min-src-fraction=0 
 -Dmapreduce.job.reduce.slowstart.completedmaps=0.01 
 -Dmapreduce.reduce.java.opts=-Xmx1024m 
 -Dtez.am.container.session.delay-allocation-millis=12 
 /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 
 /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 
 -generateSplitsInClient true
 14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from 
 hadoop-metrics2.properties
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
 60 second(s).
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics 
 system started
 14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging 
 directory 
 wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
  are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx--
 14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
 14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application 
 application_1418977790315_0016
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount 
 DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, 
 outputPath=/user/hrt_qa/Tez_CROutput_1
 14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, 
 splitsDir=wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
 14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 
 20
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to 
 get into ready state
 14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via 
 proxy
 org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: 
 java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
 destination host is: workernode1:59575; java.net.UnknownHostException; For 
 more details see:  http://wiki.apache.org/hadoop/UnknownHost
   at 
 org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   

[jira] [Commented] (TEZ-1882) Tez UI build does not work on Windows

2015-01-07 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268283#comment-14268283
 ] 

Bikas Saha commented on TEZ-1882:
-

Branch-0.6 - commit 30d485dd41e83cb70b27195e4fa986b8bb586933
Author: Bikas Saha bi...@apache.org
Date:   Wed Jan 7 13:25:32 2015 -0800

TEZ-1882. Tez UI build does not work on Windows (Prakash Ramachandran via 
bikas)
(cherry picked from commit b9c834a283c0711655f84f51570e70ac38753426)


 Tez UI build does not work on Windows
 -

 Key: TEZ-1882
 URL: https://issues.apache.org/jira/browse/TEZ-1882
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Prakash Ramachandran
Priority: Blocker
 Fix For: 0.6.0

 Attachments: TEZ-1882.1.patch, TEZ-1882.2.patch


 It fails during Bower install because it cannot launch node/node. After 
 working around that the bower script itself fails because its a bash script 
 and will not run on windows. Specific the following command fails in 
 node_modules\.bin\bower
 basedir=`dirname $0`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-305) Convert umbilical objects to PB serialization

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-305:
---
Target Version/s: 0.7.0

 Convert umbilical objects to PB serialization
 -

 Key: TEZ-305
 URL: https://issues.apache.org/jira/browse/TEZ-305
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Bikas Saha
  Labels: TEZ-0.2.0, engine





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-516) Add a join example using the Broadcast edge

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved TEZ-516.

Resolution: Done

Join/Intersect example was added elsewhere.

 Add a join example using the Broadcast edge
 ---

 Key: TEZ-516
 URL: https://issues.apache.org/jira/browse/TEZ-516
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-485) Get rid of TezTaskStatus

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-485:
---
Priority: Minor  (was: Major)

 Get rid of TezTaskStatus
 

 Key: TEZ-485
 URL: https://issues.apache.org/jira/browse/TEZ-485
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Siddharth Seth
Priority: Minor

 TezTaskStatus is used by the MR Reporter only. We should be able to get rid 
 of this interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1924) Tez AM does not register with AM with full FQDN causing jobs to fail in some environments

2015-01-07 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-1924:
-
Target Version/s: 0.5.4

 Tez AM does not register with AM with full FQDN causing jobs to fail in some 
 environments
 -

 Key: TEZ-1924
 URL: https://issues.apache.org/jira/browse/TEZ-1924
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: TEZ-20.patch


 Issue originally reported by [~Karam Singh].
 All OrderWordCount, WordCount and Tez tests faultTolerance system tests 
 failed due to java.net.UnknownHostException
 Interesting other tez examples such as mrrsleep, randomwriter, 
 randomtextwriter, sort, join_inner, join_outer, terasort, 
 groupbyorderbymrrtest ran fine
 one such example is following
 {code}
 RUNNING: /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount 
 -DUSE_TEZ_SESSION=true -Dmapreduce.map.memory.mb=2048 
 -Dtez.am.shuffle-vertex-manager.max-src-fraction=0 
 -Dmapreduce.reduce.memory.mb=2048 -Dmapreduce.framework.name=yarn-tez 
 -Dtez.am.container.reuse.enabled=false -Dtez.am.log.level=DEBUG 
 -Dmapreduce.map.java.opts=-Xmx1024m 
 -Dtez.am.shuffle-vertex-manager.min-src-fraction=0 
 -Dmapreduce.job.reduce.slowstart.completedmaps=0.01 
 -Dmapreduce.reduce.java.opts=-Xmx1024m 
 -Dtez.am.container.session.delay-allocation-millis=12 
 /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 
 /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 
 -generateSplitsInClient true
 14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from 
 hadoop-metrics2.properties
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
 60 second(s).
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics 
 system started
 14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging 
 directory 
 wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
  are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx--
 14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
 14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application 
 application_1418977790315_0016
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount 
 DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, 
 outputPath=/user/hrt_qa/Tez_CROutput_1
 14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, 
 splitsDir=wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
 14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 
 20
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to 
 get into ready state
 14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via 
 proxy
 org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: 
 java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
 destination host is: workernode1:59575; java.net.UnknownHostException; For 
 more details see:  http://wiki.apache.org/hadoop/UnknownHost
   at 
 org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
   at 

[jira] [Updated] (TEZ-1924) Tez AM does not register with AM with full FQDN causing jobs to fail in some environments

2015-01-07 Thread Ivan Mitic (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Mitic updated TEZ-1924:

Attachment: TEZ-1924.2.patch

Thanks Hitesh for the quick review!

Attaching the updated addressing your comments.

 Tez AM does not register with AM with full FQDN causing jobs to fail in some 
 environments
 -

 Key: TEZ-1924
 URL: https://issues.apache.org/jira/browse/TEZ-1924
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: TEZ-1924.2.patch, TEZ-20.patch


 Issue originally reported by [~Karam Singh].
 All OrderWordCount, WordCount and Tez tests faultTolerance system tests 
 failed due to java.net.UnknownHostException
 Interesting other tez examples such as mrrsleep, randomwriter, 
 randomtextwriter, sort, join_inner, join_outer, terasort, 
 groupbyorderbymrrtest ran fine
 one such example is following
 {code}
 RUNNING: /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount 
 -DUSE_TEZ_SESSION=true -Dmapreduce.map.memory.mb=2048 
 -Dtez.am.shuffle-vertex-manager.max-src-fraction=0 
 -Dmapreduce.reduce.memory.mb=2048 -Dmapreduce.framework.name=yarn-tez 
 -Dtez.am.container.reuse.enabled=false -Dtez.am.log.level=DEBUG 
 -Dmapreduce.map.java.opts=-Xmx1024m 
 -Dtez.am.shuffle-vertex-manager.min-src-fraction=0 
 -Dmapreduce.job.reduce.slowstart.completedmaps=0.01 
 -Dmapreduce.reduce.java.opts=-Xmx1024m 
 -Dtez.am.container.session.delay-allocation-millis=12 
 /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 
 /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 
 -generateSplitsInClient true
 14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from 
 hadoop-metrics2.properties
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
 60 second(s).
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics 
 system started
 14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging 
 directory 
 wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
  are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx--
 14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
 14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application 
 application_1418977790315_0016
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount 
 DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, 
 outputPath=/user/hrt_qa/Tez_CROutput_1
 14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, 
 splitsDir=wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
 14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 
 20
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to 
 get into ready state
 14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via 
 proxy
 org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: 
 java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
 destination host is: workernode1:59575; java.net.UnknownHostException; For 
 more details see:  http://wiki.apache.org/hadoop/UnknownHost
   at 
 org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 

[jira] [Commented] (TEZ-1924) Tez AM does not register with AM with full FQDN causing jobs to fail in some environments

2015-01-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268328#comment-14268328
 ] 

Hitesh Shah commented on TEZ-1924:
--

+1. Looks good. Committing shortly. 

 Tez AM does not register with AM with full FQDN causing jobs to fail in some 
 environments
 -

 Key: TEZ-1924
 URL: https://issues.apache.org/jira/browse/TEZ-1924
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Attachments: TEZ-1924.2.patch, TEZ-20.patch


 Issue originally reported by [~Karam Singh].
 All OrderWordCount, WordCount and Tez tests faultTolerance system tests 
 failed due to java.net.UnknownHostException
 Interesting other tez examples such as mrrsleep, randomwriter, 
 randomtextwriter, sort, join_inner, join_outer, terasort, 
 groupbyorderbymrrtest ran fine
 one such example is following
 {code}
 RUNNING: /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount 
 -DUSE_TEZ_SESSION=true -Dmapreduce.map.memory.mb=2048 
 -Dtez.am.shuffle-vertex-manager.max-src-fraction=0 
 -Dmapreduce.reduce.memory.mb=2048 -Dmapreduce.framework.name=yarn-tez 
 -Dtez.am.container.reuse.enabled=false -Dtez.am.log.level=DEBUG 
 -Dmapreduce.map.java.opts=-Xmx1024m 
 -Dtez.am.shuffle-vertex-manager.min-src-fraction=0 
 -Dmapreduce.job.reduce.slowstart.completedmaps=0.01 
 -Dmapreduce.reduce.java.opts=-Xmx1024m 
 -Dtez.am.container.session.delay-allocation-millis=12 
 /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 
 /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 
 -generateSplitsInClient true
 14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from 
 hadoop-metrics2.properties
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
 60 second(s).
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics 
 system started
 14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging 
 directory 
 wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
  are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx--
 14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
 14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application 
 application_1418977790315_0016
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount 
 DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, 
 outputPath=/user/hrt_qa/Tez_CROutput_1
 14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, 
 splitsDir=wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
 14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 
 20
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to 
 get into ready state
 14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via 
 proxy
 org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: 
 java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
 destination host is: workernode1:59575; java.net.UnknownHostException; For 
 more details see:  http://wiki.apache.org/hadoop/UnknownHost
   at 
 org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 

[jira] [Commented] (TEZ-519) Misleading stack trace when using sessions

2015-01-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268365#comment-14268365
 ] 

Siddharth Seth commented on TEZ-519:


Is this still a problem ?

 Misleading stack trace when using sessions
 --

 Key: TEZ-519
 URL: https://issues.apache.org/jira/browse/TEZ-519
 Project: Apache Tez
  Issue Type: Bug
Reporter: Bikas Saha

 13/09/27 12:43:00 INFO client.RMProxy: Connecting to ResourceManager at 
 /0.0.0.0:54311  
 13/09/27 12:43:01 INFO examples.OrderedWordCount: Creating Tez Session
   
 13/09/27 12:43:01 INFO client.RMProxy: Connecting to ResourceManager at 
 /0.0.0.0:54311  
 13/09/27 12:43:03 INFO impl.YarnClientImpl: Submitted application 
 application_1380218649569_0047 to ResourceManager at /0.0.0.0:54311   
   
   
 13/09/27 12:43:03 INFO examples.OrderedWordCount: Created Tez Session 
   
 13/09/27 12:43:03 INFO client.TezSession: Shutting down Tez Session, 
 sessionName=OrderedWordCountSession, 
 applicationId=application_1380218649569_0047  
   
   
 13/09/27 12:43:03 INFO client.TezSession: Could not connect to AM, killing 
 session via YARN, sessionName=OrderedWordCountSession, 
 applicationId=application_1380218649569_0047  
 
 13/09/27 12:43:03 INFO impl.YarnClientImpl: Killing application 
 application_1380218649569_0047  
 org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /out 
 already exists   
 at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:357)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 

 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
 at java.lang.reflect.Method.invoke(Method.java:597)   
   
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)   
   
 at 
 org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:79)   

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 

 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
 at java.lang.reflect.Method.invoke(Method.java:597)   
   
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1924) Tez AM does not register with AM with full FQDN causing jobs to fail in some environments

2015-01-07 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268394#comment-14268394
 ] 

Ivan Mitic commented on TEZ-1924:
-

Thanks for the quick turnaround [~Hitesh]!

 Tez AM does not register with AM with full FQDN causing jobs to fail in some 
 environments
 -

 Key: TEZ-1924
 URL: https://issues.apache.org/jira/browse/TEZ-1924
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Ivan Mitic
Assignee: Ivan Mitic
 Fix For: 0.5.4

 Attachments: TEZ-1924.2.patch, TEZ-20.patch


 Issue originally reported by [~Karam Singh].
 All OrderWordCount, WordCount and Tez tests faultTolerance system tests 
 failed due to java.net.UnknownHostException
 Interesting other tez examples such as mrrsleep, randomwriter, 
 randomtextwriter, sort, join_inner, join_outer, terasort, 
 groupbyorderbymrrtest ran fine
 one such example is following
 {code}
 RUNNING: /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount 
 -DUSE_TEZ_SESSION=true -Dmapreduce.map.memory.mb=2048 
 -Dtez.am.shuffle-vertex-manager.max-src-fraction=0 
 -Dmapreduce.reduce.memory.mb=2048 -Dmapreduce.framework.name=yarn-tez 
 -Dtez.am.container.reuse.enabled=false -Dtez.am.log.level=DEBUG 
 -Dmapreduce.map.java.opts=-Xmx1024m 
 -Dtez.am.shuffle-vertex-manager.min-src-fraction=0 
 -Dmapreduce.job.reduce.slowstart.completedmaps=0.01 
 -Dmapreduce.reduce.java.opts=-Xmx1024m 
 -Dtez.am.container.session.delay-allocation-millis=12 
 /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 
 /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 
 -generateSplitsInClient true
 14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from 
 hadoop-metrics2.properties
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
 60 second(s).
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics 
 system started
 14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging 
 directory 
 wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
  are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx--
 14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
 14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application 
 application_1418977790315_0016
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount 
 DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, 
 outputPath=/user/hrt_qa/Tez_CROutput_1
 14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, 
 splitsDir=wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
 14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 
 20
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to 
 get into ready state
 14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via 
 proxy
 org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: 
 java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
 destination host is: workernode1:59575; java.net.UnknownHostException; For 
 more details see:  http://wiki.apache.org/hadoop/UnknownHost
   at 
 org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
 

[jira] [Updated] (TEZ-1903) Fix findbugs warnings in tez-runtime-internals

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1903:

Attachment: TEZ-1903.1.txt

[~hitesh] - please review.

 Fix findbugs warnings in tez-runtime-internals
 --

 Key: TEZ-1903
 URL: https://issues.apache.org/jira/browse/TEZ-1903
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Siddharth Seth
 Attachments: TEZ-1903.1.txt


 https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-1903) Fix findbugs warnings in tez-runtime-internals

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1903:

Target Version/s: 0.7.0

 Fix findbugs warnings in tez-runtime-internals
 --

 Key: TEZ-1903
 URL: https://issues.apache.org/jira/browse/TEZ-1903
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Siddharth Seth
 Attachments: TEZ-1903.1.txt


 https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1923) FetcherOrderedGrouped gets into infinite loop due to memory pressure

2015-01-07 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267859#comment-14267859
 ] 

Hitesh Shah commented on TEZ-1923:
--

[~rajesh.balamohan] This seems like a critical issue. Any reason why it is not 
targeted to 0.5.4? 

 FetcherOrderedGrouped gets into infinite loop due to memory pressure
 

 Key: TEZ-1923
 URL: https://issues.apache.org/jira/browse/TEZ-1923
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-1923.1.patch


 - Ran a comparatively large job (temp table creation) at 10 TB scale.
 - Turned on intermediate mem-to-mem 
 (tez.runtime.shuffle.memory-to-memory.enable=true and 
 tez.runtime.shuffle.memory-to-memory.segments=4)
 - Some reducers get lots of data and quickly gets into infinite loop
 {code}
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 {code}
 Additional debug/patch statements revealed that InMemoryMerge is not invoked 
 appropriately and not releasing the memory back for fetchers to proceed. e.g 
 debug/patch messages are given below
 {code}
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1551867234, memoryLimit=1073741824, commitMemory=883028388, 
 mergeThreshold=708669632  === InMemoryMerge would be started in this case 
 as commitMemory = mergeThreshold
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1273349784, memoryLimit=1073741824, commitMemory=347296632, 
 mergeThreshold=708669632 === InMemoryMerge would *NOT* be started in this 
 case as commitMemory  mergeThreshold.  But the usedMemory is higher than 
 memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
 released. InMemoryMerge will not kick in and not release memory.
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:53,163 INFO 
 [fetcher [Map_1] #1] orderedgrouped.MergeManager: 
 Patch..usedMemory=1191994052, memoryLimit=1073741824, commitMemory=523155206, 
 mergeThreshold=708669632 === InMemoryMerge would *NOT* be started in this 
 case as commitMemory  mergeThreshold.  But the usedMemory is higher than 
 memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
 released.  InMemoryMerge will not kick in and not release memory.
 {code}
 In MergeManager, in memory merging is invoked under the following condition
 {code}
 if (!inMemoryMerger.isInProgress()  commitMemory = mergeThreshold)
 {code}
 

[jira] [Commented] (TEZ-1922) Fix comments: add UNSORTED_OUTPUT to TEZ_TASK_SCALE_MEMORY_WEIGHTED_RATIOS

2015-01-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268058#comment-14268058
 ] 

Siddharth Seth commented on TEZ-1922:
-

+1.

 Fix comments: add UNSORTED_OUTPUT to TEZ_TASK_SCALE_MEMORY_WEIGHTED_RATIOS
 --

 Key: TEZ-1922
 URL: https://issues.apache.org/jira/browse/TEZ-1922
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
Priority: Minor
 Attachments: TEZ-1922.1.patch


 Example provided for TEZ_TASK_SCALE_MEMORY_WEIGHTED_RATIOS in 
 TezConfiguration is missing UNSORTED_OUTPUT. 
 
 PARTITIONED_UNSORTED_OUTPUT:0,UNSORTED_INPUT:1,SORTED_OUTPUT:2,SORTED_MERGED_INPUT:3,PROCESSOR:1,OTHER:1
 
 If user tries to set the value by referring to this, it would end up throwing 
 exceptions in 
 org.apache.tez.runtime.library.resources.WeightedScalingMemoryDistributor.populateTypeScaleMap()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (TEZ-1903) Fix findbugs warnings in tez-runtime-internals

2015-01-07 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned TEZ-1903:
---

Assignee: Siddharth Seth

 Fix findbugs warnings in tez-runtime-internals
 --

 Key: TEZ-1903
 URL: https://issues.apache.org/jira/browse/TEZ-1903
 Project: Apache Tez
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Siddharth Seth

 https://builds.apache.org/job/PreCommit-Tez-Build/8/artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-1912) Merge exceptions are thrown when enabling tez.runtime.shuffle.memory-to-memory.enable tez.runtime.shuffle.memory-to-memory.segments

2015-01-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268082#comment-14268082
 ] 

Siddharth Seth commented on TEZ-1912:
-

+1. Looks good.

 Merge exceptions are thrown when enabling 
 tez.runtime.shuffle.memory-to-memory.enable  
 tez.runtime.shuffle.memory-to-memory.segments
 --

 Key: TEZ-1912
 URL: https://issues.apache.org/jira/browse/TEZ-1912
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
 Attachments: TEZ-1912.1.patch


 Merge exceptions are thrown when running a hive query on tez with the 
 following setting.  It works fine without mem-to-mem merge setting.
 {code}
 2015-01-04 20:04:01,371 ERROR [ShuffleAndMergeRunner [Map_1]] 
 orderedgrouped.Shuffle: ShuffleRunner failed with error
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError:
  Error while doing final merge
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:364)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:327)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Rec# 22630125: Negative value-length: -1
 at 
 org.apache.tez.runtime.library.common.sort.impl.IFile$Reader.positionToNextRecord(IFile.java:720)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.InMemoryReader.readRawKey(InMemoryReader.java:104)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$Segment.readRawKey(TezMerger.java:329)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.adjustPriorityQueue(TezMerger.java:500)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.next(TezMerger.java:545)
 at 
 org.apache.tez.runtime.library.common.sort.impl.TezMerger.writeFile(TezMerger.java:204)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:862)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:473)
 at 
 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.call(Shuffle.java:362)
 ... 5 more
 {code}
 {code}
 $HIVE_HOME/bin/hive -hiveconf tez.runtime.io.sort.factor=200 --hiveconf 
 tez.shuffle-vertex-manager.min-src-fraction=1.0 --hiveconf 
 tez.shuffle-vertex-manager.max-src-fraction=1.0 --hiveconf 
 hive.tez.auto.reducer.parallelism=false --hiveconf 
 tez.am.heartbeat.interval-ms.max=20 --hiveconf tez.runtime.io.sort.mb=1200 
 --hiveconf tez.runtime.sort.threads=2 --hiveconf 
 tez.history.logging.service.class=org.apache.tez.dag.history.logging.impl.SimpleHistoryLoggingService
  --hiveconf hive.tez.container.size=4096 --hiveconf 
 tez.runtime.shuffle.memory-to-memory.enable=true --hiveconf 
 tez.runtime.shuffle.memory-to-memory.segments=4
 --10 TB dataset
 use tpcds4_bin_partitioned_orc_1;
 drop table testData;
 create table testData as select 
 ss_sold_date_sk,ss_sold_time_sk,ss_item_sk,ss_customer_sk,ss_quantity,ss_sold_date
  from store_sales distribute by ss_sold_date;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-1924) Tez AM does not register with AM with full FQDN causing jobs to fail in some environments

2015-01-07 Thread Ivan Mitic (JIRA)
Ivan Mitic created TEZ-1924:
---

 Summary: Tez AM does not register with AM with full FQDN causing 
jobs to fail in some environments
 Key: TEZ-1924
 URL: https://issues.apache.org/jira/browse/TEZ-1924
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Ivan Mitic


Issue originally reported by [~Karam Singh].

All OrderWordCount, WordCount and Tez tests faultTolerance system tests failed 
due to java.net.UnknownHostException
Interesting other tez examples such as mrrsleep, randomwriter, 
randomtextwriter, sort, join_inner, join_outer, terasort, groupbyorderbymrrtest 
ran fine
one such example is following
{code}
RUNNING: /usr/lib/hadoop/bin/hadoop jar 
/usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount 
-DUSE_TEZ_SESSION=true -Dmapreduce.map.memory.mb=2048 
-Dtez.am.shuffle-vertex-manager.max-src-fraction=0 
-Dmapreduce.reduce.memory.mb=2048 -Dmapreduce.framework.name=yarn-tez 
-Dtez.am.container.reuse.enabled=false -Dtez.am.log.level=DEBUG 
-Dmapreduce.map.java.opts=-Xmx1024m 
-Dtez.am.shuffle-vertex-manager.min-src-fraction=0 
-Dmapreduce.job.reduce.slowstart.completedmaps=0.01 
-Dmapreduce.reduce.java.opts=-Xmx1024m 
-Dtez.am.container.session.delay-allocation-millis=12 
/user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 
/user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 
-generateSplitsInClient true
14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: 
http://0.0.0.0:8188/ws/v1/timeline/
14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at 
headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History 
server at /0.0.0.0:10200
14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 60 
second(s).
14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics system 
started
14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging directory 
wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
 are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx--
14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: 
http://0.0.0.0:8188/ws/v1/timeline/
14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at 
headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History 
server at /0.0.0.0:10200
14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application 
application_1418977790315_0016
14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount DAG, 
dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, 
outputPath=/user/hrt_qa/Tez_CROutput_1
14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, 
splitsDir=wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 20
14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to get 
into ready state
14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via proxy
org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: 
java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
destination host is: workernode1:59575; java.net.UnknownHostException; For 
more details see:  http://wiki.apache.org/hadoop/UnknownHost
at 
org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
at 
org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
at 
org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
at 
org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 

[jira] [Commented] (TEZ-1923) FetcherOrderedGrouped gets into infinite loop due to memory pressure

2015-01-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268089#comment-14268089
 ] 

Siddharth Seth commented on TEZ-1923:
-

This seems to affect the MemoryToMemory merger only. That should not be enabled 
in 0.5 since it hasn't been tested much.

 FetcherOrderedGrouped gets into infinite loop due to memory pressure
 

 Key: TEZ-1923
 URL: https://issues.apache.org/jira/browse/TEZ-1923
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
 Attachments: TEZ-1923.1.patch


 - Ran a comparatively large job (temp table creation) at 10 TB scale.
 - Turned on intermediate mem-to-mem 
 (tez.runtime.shuffle.memory-to-memory.enable=true and 
 tez.runtime.shuffle.memory-to-memory.segments=4)
 - Some reducers get lots of data and quickly gets into infinite loop
 {code}
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 3ms
 2015-01-07 02:36:56,644 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 1ms
 2015-01-07 02:36:56,645 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,647 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 2ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 2015-01-07 02:36:56,653 INFO [fetcher [Map_1] #2] 
 orderedgrouped.ShuffleScheduler: m1:13562 freed by fetcher [Map_1] #2 in 5ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] shuffle.HttpConnection: for 
 url=http://m1:13562/mapOutput?job=job_142126204_0201reduce=34map=attempt_142126204_0201_1_00_000420_0_10027keepAlive=true
  sent hash and receievd reply 0 ms
 2015-01-07 02:36:56,654 INFO [fetcher [Map_1] #2] 
 orderedgrouped.FetcherOrderedGrouped: fetcher#2 - MergerManager returned 
 Status.WAIT ...
 {code}
 Additional debug/patch statements revealed that InMemoryMerge is not invoked 
 appropriately and not releasing the memory back for fetchers to proceed. e.g 
 debug/patch messages are given below
 {code}
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:48,332 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1551867234, memoryLimit=1073741824, commitMemory=883028388, 
 mergeThreshold=708669632  === InMemoryMerge would be started in this case 
 as commitMemory = mergeThreshold
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:52,900 INFO 
 [fetcher [Map_1] #2] orderedgrouped.MergeManager: 
 Patch..usedMemory=1273349784, memoryLimit=1073741824, commitMemory=347296632, 
 mergeThreshold=708669632 === InMemoryMerge would *NOT* be started in this 
 case as commitMemory  mergeThreshold.  But the usedMemory is higher than 
 memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
 released. InMemoryMerge will not kick in and not release memory.
 syslog_attempt_142126204_0201_1_01_34_0:2015-01-07 02:05:53,163 INFO 
 [fetcher [Map_1] #1] orderedgrouped.MergeManager: 
 Patch..usedMemory=1191994052, memoryLimit=1073741824, commitMemory=523155206, 
 mergeThreshold=708669632 === InMemoryMerge would *NOT* be started in this 
 case as commitMemory  mergeThreshold.  But the usedMemory is higher than 
 memoryLimit.  Fetchers would keep waiting indefinitely until memory is 
 released.  InMemoryMerge will not kick in and not release memory.
 {code}
 In MergeManager, in memory merging is invoked under the following condition
 {code}
 if (!inMemoryMerger.isInProgress()  commitMemory = 

[jira] [Commented] (TEZ-1924) Tez AM does not register with AM with full FQDN causing jobs to fail in some environments

2015-01-07 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268091#comment-14268091
 ] 

Ivan Mitic commented on TEZ-1924:
-

I think I have the root cause at this point. Tez client is trying to talk to 
its AM, and given that AM is registered with a short host name (workernode0), 
Tez client is failing to talk to it. If Tez AM registered with the RM using a 
FQDN we would not have this problem.

 Tez AM does not register with AM with full FQDN causing jobs to fail in some 
 environments
 -

 Key: TEZ-1924
 URL: https://issues.apache.org/jira/browse/TEZ-1924
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Ivan Mitic

 Issue originally reported by [~Karam Singh].
 All OrderWordCount, WordCount and Tez tests faultTolerance system tests 
 failed due to java.net.UnknownHostException
 Interesting other tez examples such as mrrsleep, randomwriter, 
 randomtextwriter, sort, join_inner, join_outer, terasort, 
 groupbyorderbymrrtest ran fine
 one such example is following
 {code}
 RUNNING: /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount 
 -DUSE_TEZ_SESSION=true -Dmapreduce.map.memory.mb=2048 
 -Dtez.am.shuffle-vertex-manager.max-src-fraction=0 
 -Dmapreduce.reduce.memory.mb=2048 -Dmapreduce.framework.name=yarn-tez 
 -Dtez.am.container.reuse.enabled=false -Dtez.am.log.level=DEBUG 
 -Dmapreduce.map.java.opts=-Xmx1024m 
 -Dtez.am.shuffle-vertex-manager.min-src-fraction=0 
 -Dmapreduce.job.reduce.slowstart.completedmaps=0.01 
 -Dmapreduce.reduce.java.opts=-Xmx1024m 
 -Dtez.am.container.session.delay-allocation-millis=12 
 /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 
 /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 
 -generateSplitsInClient true
 14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from 
 hadoop-metrics2.properties
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
 60 second(s).
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics 
 system started
 14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging 
 directory 
 wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
  are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx--
 14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
 14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application 
 application_1418977790315_0016
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount 
 DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, 
 outputPath=/user/hrt_qa/Tez_CROutput_1
 14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, 
 splitsDir=wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
 14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 
 20
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to 
 get into ready state
 14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via 
 proxy
 org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: 
 java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
 destination host is: workernode1:59575; java.net.UnknownHostException; For 
 more details see:  http://wiki.apache.org/hadoop/UnknownHost
   at 
 org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 

[jira] [Updated] (TEZ-1924) Tez AM does not register with AM with full FQDN causing jobs to fail in some environments

2015-01-07 Thread Ivan Mitic (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Mitic updated TEZ-1924:

Attachment: TEZ-20.patch

Attaching the patch.

Patch is modeled based on what MRv2 AM is doing. Basically, Tez AM should use 
NodeManager's supplied hostname when it registers with the RM.

 Tez AM does not register with AM with full FQDN causing jobs to fail in some 
 environments
 -

 Key: TEZ-1924
 URL: https://issues.apache.org/jira/browse/TEZ-1924
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.5.2
Reporter: Ivan Mitic
 Attachments: TEZ-20.patch


 Issue originally reported by [~Karam Singh].
 All OrderWordCount, WordCount and Tez tests faultTolerance system tests 
 failed due to java.net.UnknownHostException
 Interesting other tez examples such as mrrsleep, randomwriter, 
 randomtextwriter, sort, join_inner, join_outer, terasort, 
 groupbyorderbymrrtest ran fine
 one such example is following
 {code}
 RUNNING: /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/tez/tez-mapreduce-examples-0.4.0.2.1.7.0-784.jar orderedwordcount 
 -DUSE_TEZ_SESSION=true -Dmapreduce.map.memory.mb=2048 
 -Dtez.am.shuffle-vertex-manager.max-src-fraction=0 
 -Dmapreduce.reduce.memory.mb=2048 -Dmapreduce.framework.name=yarn-tez 
 -Dtez.am.container.reuse.enabled=false -Dtez.am.log.level=DEBUG 
 -Dmapreduce.map.java.opts=-Xmx1024m 
 -Dtez.am.shuffle-vertex-manager.min-src-fraction=0 
 -Dmapreduce.job.reduce.slowstart.completedmaps=0.01 
 -Dmapreduce.reduce.java.opts=-Xmx1024m 
 -Dtez.am.container.session.delay-allocation-millis=12 
 /user/hrt_qa/Tez_CR_1/TestContainerReuse1 /user/hrt_qa/Tez_CROutput_1 
 /user/hrt_qa/Tez_CR_2/TestContainerReuse2 /user/hrt_qa/Tez_CROutput_2 
 -generateSplitsInClient true
 14/12/19 09:20:05 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:05 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:05 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:06 INFO impl.MetricsConfig: loaded properties from 
 hadoop-metrics2.properties
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
 60 second(s).
 14/12/19 09:20:06 INFO impl.MetricsSystemImpl: azure-file-system metrics 
 system started
 14/12/19 09:20:07 INFO client.TezClientUtils: Permissions on staging 
 directory 
 wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
  are incorrect: rwxr-xr-x. Fixing permissions to correct value rwx--
 14/12/19 09:20:07 INFO examples.OrderedWordCount: Creating Tez Session
 14/12/19 09:20:07 INFO impl.TimelineClientImpl: Timeline service address: 
 http://0.0.0.0:8188/ws/v1/timeline/
 14/12/19 09:20:07 INFO client.RMProxy: Connecting to ResourceManager at 
 headnode0.humb-tez1-ssh.d5.internal.cloudapp.net/10.0.0.87:8050
 14/12/19 09:20:07 INFO client.AHSProxy: Connecting to Application History 
 server at /0.0.0.0:10200
 14/12/19 09:20:09 INFO impl.YarnClientImpl: Submitted application 
 application_1418977790315_0016
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Created Tez Session
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Running OrderedWordCount 
 DAG, dagIndex=1, inputPath=/user/hrt_qa/Tez_CR_1/TestContainerReuse1, 
 outputPath=/user/hrt_qa/Tez_CROutput_1
 14/12/19 09:20:09 INFO hadoop.MRHelpers: Generating new input splits, 
 splitsDir=wasb://humb-t...@humboldttesting.blob.core.windows.net/user/hrt_qa/.staging/application_1418977790315_0016
 14/12/19 09:20:09 INFO input.FileInputFormat: Total input paths to process : 
 20
 14/12/19 09:20:09 INFO examples.OrderedWordCount: Waiting for TezSession to 
 get into ready state
 14/12/19 09:20:14 INFO client.TezSession: Failed to retrieve AM Status via 
 proxy
 org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: 
 java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
 destination host is: workernode1:59575; java.net.UnknownHostException; For 
 more details see:  http://wiki.apache.org/hadoop/UnknownHost
   at 
 org.apache.tez.client.TezSession.getSessionStatus(TezSession.java:351)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.waitForTezSessionReady(OrderedWordCount.java:538)
   at 
 org.apache.tez.mapreduce.examples.OrderedWordCount.main(OrderedWordCount.java:461)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at