jrmcclurg opened a new pull request, #18910: URL: https://github.com/apache/kafka/pull/18910
*More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* Currently the output of the `DumpLogSegments` tool is quite tricky to parse, making it difficult to use as part of disaster-recovery tooling. Here is an example output to demonstrate some of the issues: ``` Dumping 2.log Log starting offset: 2 baseOffset: 0 lastOffset: 0 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false deleteHorizonMs: OptionalLong.empty position: 0 CreateTime: 1739569505569 size: 131 magic: 2 compresscodec: none crc: 90285099 isvalid: true | offset: 0 CreateTime: 1739569505569 keySize: -1 valueSize: 14 sequence: -1 headerKeys: [myheader,myotherheader,mythird:header] payload: This is a test baseOffset: 1 lastOffset: 1 count: 1 baseSequence: -1 lastSequence: -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional: false isControl: false deleteHorizonMs: OptionalLong.empty position: 131 CreateTime: 1739569599085 size: 149 magic: 2 compresscodec: none crc: 3989822952 isvalid: true | offset: 1 CreateTime: 1739569599085 keySize: -1 valueSize: 14 sequence: -1 headerKeys: [myheader,myotherheader,mythird:header,fourth,header] payload: This is a test ``` Note that the actual stored header key/values look like this: `myheader` -> `yes` `myotherheader` -> `no` `mythird:header` -> `ok` `fourth,header` -> `wow` Key issues: 1. The printed fields are _space_-separated, meaning a context-sensitive parser needs to be used (normally I would just split on a comma or newline to parse such fields). 2. Header keys that contain commas cause ambiguity in the output (e.g., in the above printout, it looks like there are 5 keys rather than 4. 3. Values for the header keys are not shown in the output. 4. There is not a clear designation of where one outputted record ends and another begins. In the above case, `baseOffset` marks the beginning of a new record, but when dumping indexes etc., different field names would need to be matched. I have added four command-line options to address these issues: ``` /opt/kafka/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --deep-iteration --print-data-log --files 2.log --field-sep "; " --entry-caption "ENTRY\n" --record-caption "\nRECORD\n" --print-key-values ``` This gives the following output for the above example: ``` Dumping 2.log Log starting offset: 2 ENTRY baseOffset: 0; lastOffset: 0; count: 1; baseSequence: -1; lastSequence: -1; producerId: -1; producerEpoch: -1; partitionLeaderEpoch: 0; isTransactional: false; isControl: false; deleteHorizonMs: OptionalLong.empty; position: 0; CreateTime: 1739569505569; size: 131; magic: 2; compresscodec: none; crc: 90285099; isvalid: true RECORD offset: 0; CreateTime: 1739569505569; keySize: -1; valueSize: 14; sequence: -1; numHeaders: 3; headerKey(8): myheader; headerVal(3): yes; headerKey(13): myotherheader; headerVal(2): no; headerKey(14): mythird:header; headerVal(2): ok; payload: This is a test ENTRY baseOffset: 1; lastOffset: 1; count: 1; baseSequence: -1; lastSequence: -1; producerId: -1; producerEpoch: -1; partitionLeaderEpoch: 0; isTransactional: false; isControl: false; deleteHorizonMs: OptionalLong.empty; position: 131; CreateTime: 1739569599085; size: 149; magic: 2; compresscodec: none; crc: 3989822952; isvalid: true RECORD offset: 1; CreateTime: 1739569599085; keySize: -1; valueSize: 14; sequence: -1; numHeaders: 4; headerKey(8): myheader; headerVal(3): yes; headerKey(13): myotherheader; headerVal(2): no; headerKey(14): mythird:header; headerVal(2): ok; headerKey(13): fourth,header; headerVal(3): wow; payload: This is a test ``` Now the fields are semicolon-separated (any string can be used as a separator), and header keys/values are printed along with their lengths, allowing easy parsing. *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* The default values of the new command-line arguments are set to preserve the current functionality, so no existing tests should be affected. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
