[ 
https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531183#comment-14531183
 ] 

Siddharth Seth commented on TEZ-776:
------------------------------------

bq. prepareForRouting is guarded by synchronized in Edge which creates a read 
write barrier.
Is that enough to guarantee correct reads without a lock ?

bq. Agree about duplication, but each case has minor differences in which 
indices to use or which events to create and hence hard to merge.
The only difference I can tell is in the event type. Anyway, ignoring this 
since it seems consistent and correct.

bq. The array list size read is thread safe. There is only 1 writer which 
prevents concurrent modification. The size in an array/linked list is an int 
that is atomically modified. There have been no issues in numerous stress 
simulations and large jobs.
I believe [~hitesh] is planning to take a look. It does not seem correct to me 
to read from a non-thread safe structure from multiple threads without a lock, 
given insertions happen in a separate thread.

bq. Broadcast edge manager cannot continue to use legacy routing since every 
consumer task needs events from every producer task leading to memory reference 
overhead proportional to MxN, which is large for large jobs.
Did something change here ? We've never had issues with the Broadcast edge 
given that TezEvents (and the underlying DME) is shared between all tasks. 
There's a reference overhead which should not be very large.

bq. I wish I could share your optimism on TEZ-2409 being 10 lines of code but I 
am afraid I have tried to do it and found it to be a little more involved than 
that. Besides 10 lines of code would need many more lines of new tests. This 
does not have to be a blocker for 0.7.0 since its an internal framework change 
and can be done in 0.7.1
I can make this change here, if you don't mind. I don't think the patch is 
complete without OneToOne (and possibly Broadcast) going via the regular means 
so as not to introduce a regression in cpu usage.

> Reduce AM mem usage caused by storing TezEvents
> -----------------------------------------------
>
>                 Key: TEZ-776
>                 URL: https://issues.apache.org/jira/browse/TEZ-776
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>            Priority: Blocker
>         Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, 
> TEZ-776.12.patch, TEZ-776.13.patch, TEZ-776.2.patch, TEZ-776.3.patch, 
> TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, 
> TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, 
> TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, 
> TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, 
> TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, 
> With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, 
> events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, 
> without_patch_jmc_output_of_AM.png
>
>
> This is open ended at the moment.
> A fair chunk of the AM heap is taken up by TezEvents (specifically 
> DataMovementEvents - 64 bytes per event).
> Depending on the connection pattern - this puts limits on the number of tasks 
> that can be processed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to