[jira] [Updated] (TEZ-2775) Reduce logging in runtime components

2015-09-20 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2775:
--
Attachment: TEZ-2775.5.txt

lgtm. +1. 

Made minor change in ShuffleUtils.logIndividualFetchComplete (not to log the 
spill type & spill id when FINAL_MERGE_ENABLED is true).

> Reduce logging in runtime components
> 
>
> Key: TEZ-2775
> URL: https://issues.apache.org/jira/browse/TEZ-2775
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2775.1.txt, TEZ-2775.3.txt, TEZ-2775.4.txt, 
> TEZ-2775.5.txt
>
>
> Specifically Shuffle, which logs a lot for each event being processed and 
> data being fetched.
> Also PipelinedShuffle is fairly noisy - some of the information from here 
> could be consolidated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2775) Reduce logging in runtime components

2015-09-18 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2775:

Attachment: TEZ-2775.4.txt

Thanks for the updated patch [~rajesh.balamohan]. Have made a couple of small 
changes.
- logProgress logs a different message if invoked via close (as against when 
processing an event)
- shuffleToDisk / shuffleToMemory - In case of an IOException, logs the 
compressed and decompressed size along with the Exception message.
- Reverted changes in TaskReporter - to match the logging of each individual 
event batch in the AM
- Removed the successful log message at INFO level for shuffleToDisk (was done 
earlier for shuffleToMemory)

Please take another log when you get a chance, and make any other changes 
required. Will commit after that.

> Reduce logging in runtime components
> 
>
> Key: TEZ-2775
> URL: https://issues.apache.org/jira/browse/TEZ-2775
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2775.1.txt, TEZ-2775.3.txt, TEZ-2775.4.txt
>
>
> Specifically Shuffle, which logs a lot for each event being processed and 
> data being fetched.
> Also PipelinedShuffle is fairly noisy - some of the information from here 
> could be consolidated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2775) Reduce logging in runtime components

2015-09-17 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated TEZ-2775:
--
Attachment: TEZ-2775.3.txt

- Added logProgress() in ShuffleEventHandler. Added it in 
UnorderedKVInput.close(), ShuffleEventHandlerImpl.handleEvent, 
ShuffleInputEventHandlerOrderedGrouped.handleEvent. By default it should print 
every 50 times; it would print it before closing as well.
- Added logProgress() in ShuffleScheduler's close() call as well. In case of 
any abrupt failures, it should log how many are copied.
- Retained logs in MergeManager.closeInMemoryFile.
- Left "FetcherOrderedGrouped for decomp len" unchanged as it ends up occupying 
lots of space. However, this information would be useful for debugging corner 
cases issues of downloading multiple attempts from same connection. (Since it 
is not a common scenario, retaining at DEBUG level itself)
- No changes to ShuffleUtils:logIndividualFetchComplete() (need to change perf 
tool later)


Without Patch: Query_75 @ 10 TB scale (with hive patch for l4j in SMB)
application_1439860407967_1259: 418,061,352 (compressed) : 6,375,796,132 
(uncompressed)

With .1 Patch:
application_1439860407967_1260: 142,492,745 (compressed) : 1,365,908,923 
(uncompressed)

With .3 patch:
application_1439860407967_1280: 219,133,295 (compressed) : 2,410,892,797 
(uncompressed)


> Reduce logging in runtime components
> 
>
> Key: TEZ-2775
> URL: https://issues.apache.org/jira/browse/TEZ-2775
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: TEZ-2775.1.txt, TEZ-2775.3.txt
>
>
> Specifically Shuffle, which logs a lot for each event being processed and 
> data being fetched.
> Also PipelinedShuffle is fairly noisy - some of the information from here 
> could be consolidated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2775) Reduce logging in runtime components

2015-09-09 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2775:

Attachment: TEZ-2775.1.txt

First cut of the patch.

[~rajesh.balamohan] - please review.
This retains a log message for each successful fetch, which causes the logs to 
be quite large. Also retains the HttpConnection string used for fetches. Can 
either of these be removed ? Also, is this removing information which is 
critical to debugging.

> Reduce logging in runtime components
> 
>
> Key: TEZ-2775
> URL: https://issues.apache.org/jira/browse/TEZ-2775
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
> Attachments: TEZ-2775.1.txt
>
>
> Specifically Shuffle, which logs a lot for each event being processed and 
> data being fetched.
> Also PipelinedShuffle is fairly noisy - some of the information from here 
> could be consolidated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-2775) Reduce logging in runtime components

2015-09-03 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-2775:

Target Version/s: 0.5.5

> Reduce logging in runtime components
> 
>
> Key: TEZ-2775
> URL: https://issues.apache.org/jira/browse/TEZ-2775
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>
> Specifically Shuffle, which logs a lot for each event being processed and 
> data being fetched.
> Also PipelinedShuffle is fairly noisy - some of the information from here 
> could be consolidated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)