[ https://issues.apache.org/jira/browse/HBASE-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125928#comment-17125928 ]
Viraj Jasani edited comment on HBASE-24380 at 6/11/20, 11:23 AM: ----------------------------------------------------------------- Tried adding some recovered edit write info locally. *Sample output:* {code:java} Journal Log: null at 1591291865775 Opening log file hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/WALs/testsplitlogfilemultipleregions,16010,1591291865558/wal.dat.0 at 1591291865776 Finishing writing output logs and closing down at 1591291865801 Creating recovered edits writer path=hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/ccc/recovered.edits/0000000000000000002-wal.dat.0.temp at 1591291866091 Creating recovered edits writer path=hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/bbb/recovered.edits/0000000000000000001-wal.dat.0.temp at 1591291866091 3 split writer threads finished at 1591291866118 Closed recovered edits writer path=hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/ccc/recovered.edits/0000000000000000002-wal.dat.0.temp (wrote 10 edits, skipped 0 edits in 25 ms) at 1591291866137 Closed recovered edits writer path=hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/bbb/recovered.edits/0000000000000000001-wal.dat.0.temp (wrote 10 edits, skipped 0 edits in 25 ms) at 1591291866137 Rename recovered edits hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/ccc/recovered.edits/0000000000000000002-wal.dat.0.temp to hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/ccc/recovered.edits/0000000000000000020 at 1591291866139 Rename recovered edits hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/bbb/recovered.edits/0000000000000000001-wal.dat.0.temp to hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/bbb/recovered.edits/0000000000000000019 at 1591291866139 Processed 20 edits across 2 regions cost 354 ms; edits skipped=7; WAL=hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/WALs/testsplitlogfilemultipleregions,16010,1591291865558/wal.dat.0, size=2.3 K, length=2322, corrupted=false, progress failed=false at 1591291866139 {code} was (Author: vjasani): Tried adding some recovered edit write info locally. *Sample output:* Journal Log: null at 1591291865775 Opening log file hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/WALs/testsplitlogfilemultipleregions,16010,1591291865558/wal.dat.0 at 1591291865776 Finishing writing output logs and closing down at 1591291865801 Creating recovered edits writer path=hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/ccc/recovered.edits/0000000000000000002-wal.dat.0.temp at 1591291866091 Creating recovered edits writer path=hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/bbb/recovered.edits/0000000000000000001-wal.dat.0.temp at 1591291866091 3 split writer threads finished at 1591291866118 Closed recovered edits writer path=hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/ccc/recovered.edits/0000000000000000002-wal.dat.0.temp (wrote 10 edits, skipped 0 edits in 25 ms) at 1591291866137 Closed recovered edits writer path=hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/bbb/recovered.edits/0000000000000000001-wal.dat.0.temp (wrote 10 edits, skipped 0 edits in 25 ms) at 1591291866137 Rename recovered edits hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/ccc/recovered.edits/0000000000000000002-wal.dat.0.temp to hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/ccc/recovered.edits/0000000000000000020 at 1591291866139 Rename recovered edits hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/bbb/recovered.edits/0000000000000000001-wal.dat.0.temp to hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/data/default/t1/bbb/recovered.edits/0000000000000000019 at 1591291866139 Processed 20 edits across 2 regions cost 354 ms; edits skipped=7; WAL=hdfs://localhost:64541/user/vjasani/test-data/9cad028f-1b56-2eeb-fec5-c83773b30f41/WALs/testsplitlogfilemultipleregions,16010,1591291865558/wal.dat.0, size=2.3 K, length=2322, corrupted=false, progress failed=false at 1591291866139 > Improve WAL splitting log lines to enable sessionization > -------------------------------------------------------- > > Key: HBASE-24380 > URL: https://issues.apache.org/jira/browse/HBASE-24380 > Project: HBase > Issue Type: Improvement > Components: logging, Operability, wal > Reporter: Andrew Kyle Purtell > Assignee: Viraj Jasani > Priority: Minor > > Looking to reconstruct a timeline from write of recovered.edits file back to > start of WAL file split, with a bunch of unrelated activity in the meantime, > there isn't a consistent token that links split file write messages (which > include store path including region hash) to beginning of WAL splitting > activity. Sessonizing by host doesn't work because work can bounce around > through retries. Thread context names in the logs vary and can be like > [nds1-225-fra:60020-7] or [fb472085572ba72e96f1] (trailing digits of region > hash) or [splits-1589016325868] . > We could have WALSplitter get the current time when starting the split of a > WAL file and have it log this timestamp in every line as a splitting session > identifier. > Related, we should track the time of split task execution end to end and > export a metric that captures it. > It might also be worthwhile to wire up more of WAL splitting to TaskMonitor > status logging. If we do this we can also enable status journal logging, so > when a WAL split has completed, a line will appear in the log that has the > list of all status messages recorded during splitting and the time delta in > milliseconds between them. -- This message was sent by Atlassian Jira (v8.3.4#803005)