This is an automated email from the ASF dual-hosted git repository. warren pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-devlake-website.git
commit 5ceb5953596fb53f3153276856f50dda0765f96d Author: Startrekzky <[email protected]> AuthorDate: Tue Dec 6 03:27:35 2022 +0800 docs: add data layer schema --- docs/DataModels/DevLakeDomainLayerSchema.md | 7 +++++-- docs/DataModels/RawLayerSchema.md | 29 +++++++++++++++++++++++++++++ docs/DataModels/SystemTables.md | 28 ++++++++++++++++++++++++++++ docs/DataModels/ToolLayerSchema.md | 28 ++++++++++++++++++++++++++++ 4 files changed, 90 insertions(+), 2 deletions(-) diff --git a/docs/DataModels/DevLakeDomainLayerSchema.md b/docs/DataModels/DevLakeDomainLayerSchema.md index 8d27630fd..6afa9b3ce 100644 --- a/docs/DataModels/DevLakeDomainLayerSchema.md +++ b/docs/DataModels/DevLakeDomainLayerSchema.md @@ -1,8 +1,8 @@ --- title: "Domain Layer Schema" description: > - DevLake Domain Layer Schema -sidebar_position: 2 + The data tables to query engineering metrics +sidebar_position: 1 --- ## Summary @@ -11,6 +11,9 @@ This document describes Apache DevLake's domain layer schema. Referring to DevLake's [architecture](../Overview/Architecture.md), the data in the domain layer is transformed from the data in the tool layer. The tool layer schema is based on the data from specific tools such as Jira, GitHub, Gitlab, Jenkins, etc. The domain layer schema can be regarded as an abstraction of tool-layer schemas. +<p align="center"><img src="/img/Architecture/arch-dataflow.svg" /></p> +<p align="center">DevLake Dataflow</p> + Domain layer schema itself includes 2 logical layers: a `DWD` layer and a `DWM` layer. The DWD layer stores the detailed data points, while the DWM is the slight aggregation and operation of DWD to store more organized details or middle-level metrics. ## Use Cases diff --git a/docs/DataModels/RawLayerSchema.md b/docs/DataModels/RawLayerSchema.md new file mode 100644 index 000000000..07b094dea --- /dev/null +++ b/docs/DataModels/RawLayerSchema.md @@ -0,0 +1,29 @@ +--- +title: "Raw Layer Schema" +description: > + Caches raw API responses from data source plugins +sidebar_position: 3 +--- + +## Summary + +This document describes Apache DevLake's raw layer schema. + +Referring to DevLake's [architecture](../Overview/Architecture.md), the raw layer stores the API responses from data sources (DevOps tools) in JSON. This saves developers' time if the raw data is to be transformed differently later on. Please note that communicating with data sources' APIs is usually the most time-consuming step. + + +## Use Cases + +1. As a user, you can check raw data tables to verify data quality if you have concerns about the [domain layer data](DevLakeDomainLayerSchema.md). +2. As a developer, you can customize domain layer schema based on raw data tables via [customize](Plugins/customize.md). + + +## Data Models + +Raw layer tables start with a prefix `_raw_`. Each plugin contains multiple raw data tables, the naming convension of these tables is `_raw_{plugin}_{entity}`. For instance, +- _raw_jira_issues +- _raw_jira_boards +- _raw_jira_board_issues +- ... + +Normally, you do not need to use these tables, unless you have use cases above. diff --git a/docs/DataModels/SystemTables.md b/docs/DataModels/SystemTables.md new file mode 100644 index 000000000..9b128e42f --- /dev/null +++ b/docs/DataModels/SystemTables.md @@ -0,0 +1,28 @@ +--- +title: "System Tables" +description: > + Stores DevLake's own entities +sidebar_position: 4 +--- + +## Summary + +This document describes Apache DevLake's data models of its own entities. + + +## Use Cases + +1. As a user, you can check `_devlake_blueprints` and `_devlake_pipelines` when failing to collect data via DevLake's blueprint. +2. As a contributor, you can check these tables to debug task concurrency or data migration features. + + +## Data Models + +These tables start with a prefix `_devlake`. Unlike raw or tool data tables, DevLake only contains one set of system tables. The naming convension of these tables is `_raw_{plugin}_{entity}`, such as +- _devlake_blueprints +- _devlake_pipelines +- _devlake_tasks +- _devlake_subtasks +- ... + +Normally, you do not need to use these tables, unless you have use cases above. diff --git a/docs/DataModels/ToolLayerSchema.md b/docs/DataModels/ToolLayerSchema.md new file mode 100644 index 000000000..17c442502 --- /dev/null +++ b/docs/DataModels/ToolLayerSchema.md @@ -0,0 +1,28 @@ +--- +title: "Tool Layer Schema" +description: > + Extract raw data into a relational schema for each specific tool +sidebar_position: 2 +--- + +## Summary + +This document describes Apache DevLake's tool layer schema. + +Referring to DevLake's [architecture](../Overview/Architecture.md), the Tool layer extracts raw data from JSONs into a relational schema that's easier to consume by analytical tasks. Each DevOps tool would have a schema that's tailored to their data structure, hence the name, the Tool layer. + + +## Use Cases + +As a user, you can check tool data tables to verify data quality if you have concerns about the [domain layer data](DevLakeDomainLayerSchema.md). + + +## Data Models + +Tool layer tables start with a prefix `_tool_`. Each plugin contains multiple tool data tables, the naming convension of these tables is `_raw_{plugin}_{entity}`. For instance, +- _tool_jira_issues +- _tool_jira_boards +- _tool_jira_board_issues` +- ... + +Normally, you do not need to use tool layer tables, unless you have use cases above.
