[ https://issues.apache.org/jira/browse/PHOENIX-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294492#comment-15294492 ]
Mujtaba Chohan commented on PHOENIX-2929: ----------------------------------------- Sure [~poornachandra]. I've restarted txn-manager. I'll get the heap dumps when it gets back in slow state. > Transactional writes get slow over time due to possible memory leak > ------------------------------------------------------------------- > > Key: PHOENIX-2929 > URL: https://issues.apache.org/jira/browse/PHOENIX-2929 > Project: Phoenix > Issue Type: Bug > Reporter: Mujtaba Chohan > Attachments: txn-manager-log-during-slow-writes.log > > > Write to transactional table get up to ~3X slower compared to > non-transactional table. > Details: > * Transaction Manager (TM) is *always* kept up and running between all data > loads > * JVM for data loader is stopped after each 1M rows upsert > * Data loader uses 10 threads to write in parallel with 10K batch size > * Data load time is comparable for first few runs and it gets ~3X slow after > transaction manager is up for 5-10 hours > Data loader client details for txn-table: ~100sec GC pause with transactional > table. Total time to load 1M rows ~4mins. ~3.2G physical mem used. Java XMX > param was set to 3G. > {code} > JSTAT for data loader - 10 sec interval > S0C S1C S0U S1U EC EU OC OU MC > MU CCSC CCSU YGC YGCT FGC FGCT GCT > 40448.0 40960.0 0.0 21920.0 571904.0 0.0 349696.0 87959.7 > 42024.0 41469.1 4904.0 4739.9 13 0.174 2 0.072 0.246 > 333824.0 333824.0 0.0 121135.1 380928.0 84826.0 2097152.0 2009035.3 > 42280.0 41652.1 4904.0 4740.4 29 1.172 6 5.231 6.403 > 349184.0 349184.0 0.0 0.0 350208.0 350205.8 2097152.0 1952578.1 > 42536.0 41758.6 4904.0 4742.5 36 1.353 12 9.882 11.234 > 322560.0 336896.0 214566.6 104273.9 344576.0 344561.2 2097152.0 1859305.5 > 42536.0 41774.5 4904.0 4742.5 45 1.643 17 14.684 16.327 > 349184.0 349184.0 41351.1 0.0 350208.0 48147.1 2097152.0 1935605.0 > 42536.0 41811.3 4904.0 4742.5 54 2.081 24 19.862 21.943 > 349184.0 349184.0 156685.4 0.0 350208.0 0.0 2097152.0 2073856.9 > 42536.0 41817.5 4904.0 4742.5 68 2.632 30 22.649 25.281 > 344576.0 52224.0 0.0 51789.6 350208.0 0.0 2097152.0 1871862.2 > 42536.0 41823.4 4904.0 4742.5 79 2.991 37 27.764 30.756 > 349184.0 349184.0 302724.6 0.0 350208.0 0.0 2097152.0 2023258.1 > 42536.0 41830.3 4904.0 4742.5 92 3.559 43 30.363 33.922 > 305664.0 104448.0 0.0 104434.1 358400.0 114713.9 2097152.0 1857545.7 > 42536.0 41846.0 4904.0 4742.5 101 3.804 50 36.314 40.118 > 335872.0 66048.0 0.0 65680.0 350208.0 247000.5 2097152.0 1923473.1 > 42536.0 41852.3 4904.0 4742.5 115 4.364 56 39.622 43.986 > 349184.0 349184.0 0.0 349163.6 350208.0 0.0 2097152.0 1975419.5 > 42536.0 41861.6 4904.0 4742.5 125 4.863 60 44.782 49.645 > 349184.0 349184.0 0.0 17375.8 350208.0 0.0 2097152.0 1999279.6 > 42536.0 41863.4 4904.0 4742.5 133 5.091 66 50.602 55.693 > 340992.0 345088.0 0.0 0.0 350208.0 350208.0 2097152.0 1977601.3 > 42536.0 41894.1 4904.0 4742.5 146 5.566 73 54.388 59.953 > 349184.0 349184.0 0.0 0.0 350208.0 350208.0 2097152.0 1963150.5 > 42536.0 41906.5 4904.0 4742.5 158 6.036 80 58.024 64.060 > 349184.0 349184.0 0.0 63008.0 350208.0 56631.7 2097152.0 1942708.7 > 42536.0 41916.8 4904.0 4742.5 177 6.751 84 60.799 67.550 > 345600.0 349184.0 263005.5 0.0 350208.0 55075.8 2097152.0 1883022.2 > 42536.0 41925.8 4904.0 4742.5 192 7.295 88 65.144 72.439 > 349184.0 344064.0 139656.9 0.0 355328.0 0.0 2097152.0 2070558.8 > 42536.0 41935.4 4904.0 4742.5 204 7.803 95 68.618 76.421 > 349184.0 349184.0 63008.0 0.0 350208.0 0.0 2097152.0 1963035.3 > 42536.0 41971.9 4904.0 4742.5 218 8.339 102 72.534 80.873 > 349184.0 349184.0 0.0 17984.0 350208.0 253707.1 2097152.0 1848396.0 > 42536.0 41996.1 4904.0 4742.5 229 8.686 108 76.526 85.212 > 349184.0 324608.0 0.0 284036.3 351744.0 0.0 2097152.0 1776761.8 > 42536.0 42021.6 4904.0 4742.5 247 9.320 113 79.669 88.989 > 349184.0 349184.0 38510.2 0.0 350208.0 160454.7 2097152.0 1896405.6 > 42536.0 42036.0 4904.0 4742.5 256 9.555 115 80.723 90.278 > 342528.0 333824.0 147271.1 0.0 358912.0 312219.3 2097152.0 1882525.2 > 42792.0 42065.0 4904.0 4742.5 278 10.620 116 81.143 91.763 > 349184.0 283648.0 0.0 283645.6 370176.0 249240.7 2097152.0 1893727.0 > 42792.0 42066.5 4904.0 4742.5 303 11.738 117 81.684 93.422 > 349184.0 349184.0 49417.0 0.0 350208.0 172262.6 2097152.0 1127855.3 > 42792.0 42071.0 4904.0 4742.5 326 12.787 119 82.895 95.682 > 349184.0 349184.0 0.0 131245.0 350208.0 133019.3 2097152.0 1074752.6 > 42792.0 42081.4 4904.0 4742.5 349 13.943 120 83.448 97.391 > {code} > Data loader client details for non-txn table: ~12sec GC pause with > non-transactional table. Total time to load 1M rows ~1mins. ~2G physical mem > used. Java XMX param was set to 3G. > {code} > JSTAT for data loader - 10 sec interval > S0C S1C S0U S1U EC EU OC OU MC > MU CCSC CCSU YGC YGCT FGC FGCT GCT > 24576.0 28160.0 0.0 0.0 195584.0 28008.1 349696.0 25257.3 35496.0 > 35020.3 4264.0 4037.5 6 0.056 2 0.071 0.128 > 79872.0 79872.0 0.0 79860.8 888832.0 0.0 753664.0 753092.4 > 41000.0 40361.6 4648.0 4563.6 17 0.616 3 0.071 0.688 > 259584.0 260096.0 129075.4 161535.6 528384.0 528384.0 937472.0 767859.8 > 41000.0 40458.7 4648.0 4564.6 31 1.319 4 1.622 2.941 > 283136.0 278016.0 23716.8 0.0 487424.0 143734.6 1498624.0 662740.5 > 41000.0 40479.8 4648.0 4564.6 48 2.459 7 3.122 5.582 > 240640.0 251392.0 162230.5 31674.1 545792.0 545792.0 1530368.0 642312.2 > 41000.0 40531.4 4648.0 4567.6 69 3.733 9 4.069 7.802 > 289280.0 282624.0 19730.4 0.0 468480.0 403700.7 1786880.0 672352.5 > 41256.0 40645.2 4648.0 4567.6 86 4.974 11 5.160 10.134 > 282624.0 290816.0 163586.1 0.0 466944.0 268299.4 1950208.0 1491203.8 > 41256.0 40667.0 4648.0 4567.6 108 6.288 12 5.723 12.011 > {code} > * TM uses ~4G physical memory (there's no GC pause for TM) > * After restarting TM, TM memory usage drops to 0.5G and data loader GC > pauses decrease to ~15sec for transactional table and 1M gets loaded in ~1min. > This hints to possible leak in TM. Since this test was on a remote secure > cluster, I haven't been able to profile it so I'll try to repro it locally to > get more info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)