smengcl opened a new pull request, #4420:
URL: https://github.com/apache/ozone/pull/4420
## What changes were proposed in this pull request?
1. Print proper JSON **object** (`{ }`) if `--with-keys=true`.
2. Print proper JSON **array** (`[ ]`) if `--with-keys=false`.
3. `--with-keys` now defaults to `true`. Tweak some option names. Improve
error messages. Should have no compatiblity concern what-so-ever as this is
intended to be a debug tool.
4. Rewritten and parameterized `TestLDBCli`. New test cases are added.
5. Refactor `DBScanner` for readability and maintainability.
### Note
Regarding the core serialization logic in `DBScanner`, I've chosen to stick
to the current `Gson` approach that serializes **each** entry and immediately
printing it. It should consume less memory than [gathering all
entries](https://github.com/apache/ozone/blob/04cd54ce6593024dc98a9867e1fd829c4f25f85a/hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/DBScanner.java#L129-L135)
then serializing and printing it (could OOM if batch limit is too high like a
few billion entries, while the current approach should work just fine).
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6064
## How was this patch tested?
- Rewritten `TestLDBCli` and added new test cases.
- Fully migrated to JUnit 5.
- Fully parameterized tests. Able to easily add more params combination in
the future.
- Use `Named.of()` to describe each parameter for maintainability.
- Pls feel free to contribute more test cases now that adding a new case
is just a matter of a few lines. :)
- Rewritten `keyTable` tests, which is heavily inspired by @adoroszlai 's
change in #2917 (in which `testOMDB` is split into multiple test cases).
- Datanode DB `block_data` table schema V3 and V2 tests are completely
rewritten.
- This took longer for me than fixing and refactoring `DBScanner` :D
<img width="605" alt="IntelliJ"
src="https://user-images.githubusercontent.com/50227127/226076360-2c518e58-77b6-4e83-a00b-138ec57d7c79.png">
## Future potential improvement
- [ ] For SchemaV3 `block_data` table, we could further nest entries inside
another map. With the outer-most layer being `containerId`.
e.g. Currently (unchanged in this PR), the JSON key for an entry inside V3
`block_data` is `containerId: blockId`:
```shell
{ "2: 3": {
"blockID": {
"containerBlockID": {
"containerID": 2,
"localID": 3
},
"blockCommitSequenceId": 0
},
"metadata": {},
"size": 0
}, "2: 4": {
"blockID": {
"containerBlockID": {
"containerID": 2,
"localID": 4
},
"blockCommitSequenceId": 0
},
"metadata": {},
"size": 0
} }
```
With another layer, the output could become even cleaner, making it easier
to be filtered using `jq`:
```shell
{
"2": {
"3": {
"blockID": {
"containerBlockID": {
"containerID": 2,
"localID": 3
},
"blockCommitSequenceId": 0
},
"metadata": {},
"size": 0
},
"4": {
"blockID": {
"containerBlockID": {
"containerID": 2,
"localID": 4
},
"blockCommitSequenceId": 0
},
"metadata": {},
"size": 0
}
}
}
```
## Example output with this PR
Outputs are from integration test's `stdout` with dummy data for demo
purpose. Command lines are reconstructed from test parameters. Actual CLI
output can differ.
### `ozone debug ldb --db=/data/metadata/om.db scan --column-family=keyTable
--limit=1`
```shell
{ "key1": {
"volumeName": "vol1",
"bucketName": "buck1",
"keyName": "key1",
"dataSize": 1000,
"keyLocationVersions": [
{
"version": 0,
"locationVersionMap": {
"0": []
},
"isMultipartKey": false
}
],
"creationTime": 1679105144793,
"modificationTime": 1679105144793,
"replicationConfig": {
"replicationFactor": "ONE"
},
"isFile": false,
"fileName": "key1",
"acls": [],
"parentObjectID": 0,
"objectID": 0,
"updateID": 0,
"metadata": {}
} }
```
### `ozone debug ldb --db=/data/metadata/om.db scan --column-family=keyTable`
```shell
{ "key1": {
"volumeName": "vol1",
"bucketName": "buck1",
"keyName": "key1",
"dataSize": 1000,
"keyLocationVersions": [
{
"version": 0,
"locationVersionMap": {
"0": []
},
"isMultipartKey": false
}
],
"creationTime": 1679102602165,
"modificationTime": 1679102602166,
"replicationConfig": {
"replicationFactor": "ONE"
},
"isFile": false,
"fileName": "key1",
"acls": [],
"parentObjectID": 0,
"objectID": 0,
"updateID": 0,
"metadata": {}
}, "key2": {
"volumeName": "vol1",
"bucketName": "buck1",
"keyName": "key2",
"dataSize": 1000,
"keyLocationVersions": [
{
"version": 0,
"locationVersionMap": {
"0": []
},
"isMultipartKey": false
}
],
"creationTime": 1679102602279,
"modificationTime": 1679102602279,
"replicationConfig": {
"replicationFactor": "ONE"
},
"isFile": false,
"fileName": "key2",
"acls": [],
"parentObjectID": 0,
"objectID": 0,
"updateID": 0,
"metadata": {}
}, "key3": {
"volumeName": "vol1",
"bucketName": "buck1",
"keyName": "key3",
"dataSize": 1000,
"keyLocationVersions": [
{
"version": 0,
"locationVersionMap": {
"0": []
},
"isMultipartKey": false
}
],
"creationTime": 1679102602282,
"modificationTime": 1679102602282,
"replicationConfig": {
"replicationFactor": "ONE"
},
"isFile": false,
"fileName": "key3",
"acls": [],
"parentObjectID": 0,
"objectID": 0,
"updateID": 0,
"metadata": {}
}, "key4": {
"volumeName": "vol1",
"bucketName": "buck1",
"keyName": "key4",
"dataSize": 1000,
"keyLocationVersions": [
{
"version": 0,
"locationVersionMap": {
"0": []
},
"isMultipartKey": false
}
],
"creationTime": 1679102602284,
"modificationTime": 1679102602284,
"replicationConfig": {
"replicationFactor": "ONE"
},
"isFile": false,
"fileName": "key4",
"acls": [],
"parentObjectID": 0,
"objectID": 0,
"updateID": 0,
"metadata": {}
}, "key5": {
"volumeName": "vol1",
"bucketName": "buck1",
"keyName": "key5",
"dataSize": 1000,
"keyLocationVersions": [
{
"version": 0,
"locationVersionMap": {
"0": []
},
"isMultipartKey": false
}
],
"creationTime": 1679102602286,
"modificationTime": 1679102602286,
"replicationConfig": {
"replicationFactor": "ONE"
},
"isFile": false,
"fileName": "key5",
"acls": [],
"parentObjectID": 0,
"objectID": 0,
"updateID": 0,
"metadata": {}
} }
```
### `ozone debug ldb --db=/data/metadata/om.db scan --column-family=keyTable
--limit=2 --with-keys=false`
```shell
```
### `ozone debug ldb --db=/data/hdds/hdds/CID-UUID1/DS-UUID2/container.db
scan --column-family=block_data --dn-schema=V3 --container-id=2 --limit=2`
```shell
{ "2: 3": {
"blockID": {
"containerBlockID": {
"containerID": 2,
"localID": 3
},
"blockCommitSequenceId": 0
},
"metadata": {},
"size": 0
}, "2: 4": {
"blockID": {
"containerBlockID": {
"containerID": 2,
"localID": 4
},
"blockCommitSequenceId": 0
},
"metadata": {},
"size": 0
} }
```
### `ozone debug ldb --db=/data/hdds/hdds/CID-UUID1/DS-UUID2/container.db
scan --column-family=block_data --dn-schema=V2 --limit=4`
```shell
{ "1": {
"blockID": {
"containerBlockID": {
"containerID": 1,
"localID": 1
},
"blockCommitSequenceId": 0
},
"metadata": {},
"size": 0
}, "2": {
"blockID": {
"containerBlockID": {
"containerID": 1,
"localID": 2
},
"blockCommitSequenceId": 0
},
"metadata": {},
"size": 0
}, "3": {
"blockID": {
"containerBlockID": {
"containerID": 2,
"localID": 3
},
"blockCommitSequenceId": 0
},
"metadata": {},
"size": 0
}, "4": {
"blockID": {
"containerBlockID": {
"containerID": 2,
"localID": 4
},
"blockCommitSequenceId": 0
},
"metadata": {},
"size": 0
} }
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]