[ https://issues.apache.org/jira/browse/TEZ-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14531183#comment-14531183 ]
Siddharth Seth commented on TEZ-776: ------------------------------------ bq. prepareForRouting is guarded by synchronized in Edge which creates a read write barrier. Is that enough to guarantee correct reads without a lock ? bq. Agree about duplication, but each case has minor differences in which indices to use or which events to create and hence hard to merge. The only difference I can tell is in the event type. Anyway, ignoring this since it seems consistent and correct. bq. The array list size read is thread safe. There is only 1 writer which prevents concurrent modification. The size in an array/linked list is an int that is atomically modified. There have been no issues in numerous stress simulations and large jobs. I believe [~hitesh] is planning to take a look. It does not seem correct to me to read from a non-thread safe structure from multiple threads without a lock, given insertions happen in a separate thread. bq. Broadcast edge manager cannot continue to use legacy routing since every consumer task needs events from every producer task leading to memory reference overhead proportional to MxN, which is large for large jobs. Did something change here ? We've never had issues with the Broadcast edge given that TezEvents (and the underlying DME) is shared between all tasks. There's a reference overhead which should not be very large. bq. I wish I could share your optimism on TEZ-2409 being 10 lines of code but I am afraid I have tried to do it and found it to be a little more involved than that. Besides 10 lines of code would need many more lines of new tests. This does not have to be a blocker for 0.7.0 since its an internal framework change and can be done in 0.7.1 I can make this change here, if you don't mind. I don't think the patch is complete without OneToOne (and possibly Broadcast) going via the regular means so as not to introduce a regression in cpu usage. > Reduce AM mem usage caused by storing TezEvents > ----------------------------------------------- > > Key: TEZ-776 > URL: https://issues.apache.org/jira/browse/TEZ-776 > Project: Apache Tez > Issue Type: Sub-task > Reporter: Siddharth Seth > Assignee: Bikas Saha > Priority: Blocker > Attachments: TEZ-776.1.patch, TEZ-776.10.patch, TEZ-776.11.patch, > TEZ-776.12.patch, TEZ-776.13.patch, TEZ-776.2.patch, TEZ-776.3.patch, > TEZ-776.4.patch, TEZ-776.5.patch, TEZ-776.6.A.patch, TEZ-776.6.B.patch, > TEZ-776.7.patch, TEZ-776.8.patch, TEZ-776.9.patch, TEZ-776.ondemand.1.patch, > TEZ-776.ondemand.2.patch, TEZ-776.ondemand.3.patch, TEZ-776.ondemand.4.patch, > TEZ-776.ondemand.5.patch, TEZ-776.ondemand.6.patch, TEZ-776.ondemand.7.patch, > TEZ-776.ondemand.patch, With_Patch_AM_hotspots.png, > With_Patch_AM_profile.png, Without_patch_AM_CPU_Usage.png, > events-problem-solutions.txt, with_patch_jmc_output_of_AM.png, > without_patch_jmc_output_of_AM.png > > > This is open ended at the moment. > A fair chunk of the AM heap is taken up by TezEvents (specifically > DataMovementEvents - 64 bytes per event). > Depending on the connection pattern - this puts limits on the number of tasks > that can be processed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)