[incubator-devlake-website] 01/02: docs: add data layer schema

warren Wed, 07 Dec 2022 19:07:00 -0800

This is an automated email from the ASF dual-hosted git repository.

warren pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-devlake-website.git


commit 5ceb5953596fb53f3153276856f50dda0765f96d
Author: Startrekzky <[email protected]>
AuthorDate: Tue Dec 6 03:27:35 2022 +0800

    docs: add data layer schema
---
 docs/DataModels/DevLakeDomainLayerSchema.md |  7 +++++--
 docs/DataModels/RawLayerSchema.md           | 29 +++++++++++++++++++++++++++++
 docs/DataModels/SystemTables.md             | 28 ++++++++++++++++++++++++++++
 docs/DataModels/ToolLayerSchema.md          | 28 ++++++++++++++++++++++++++++
 4 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/docs/DataModels/DevLakeDomainLayerSchema.md 
b/docs/DataModels/DevLakeDomainLayerSchema.md
index 8d27630fd..6afa9b3ce 100644
--- a/docs/DataModels/DevLakeDomainLayerSchema.md
+++ b/docs/DataModels/DevLakeDomainLayerSchema.md
@@ -1,8 +1,8 @@
 ---
 title: "Domain Layer Schema"
 description: >
-  DevLake Domain Layer Schema
-sidebar_position: 2
+  The data tables to query engineering metrics
+sidebar_position: 1
 ---
 
 ## Summary
@@ -11,6 +11,9 @@ This document describes Apache DevLake's domain layer schema.
 
 Referring to DevLake's [architecture](../Overview/Architecture.md), the data 
in the domain layer is transformed from the data in the tool layer. The tool 
layer schema is based on the data from specific tools such as Jira, GitHub, 
Gitlab, Jenkins, etc. The domain layer schema can be regarded as an abstraction 
of tool-layer schemas.
 
+<p align="center"><img src="/img/Architecture/arch-dataflow.svg" /></p>
+<p align="center">DevLake Dataflow</p>
+
 Domain layer schema itself includes 2 logical layers: a `DWD` layer and a 
`DWM` layer. The DWD layer stores the detailed data points, while the DWM is 
the slight aggregation and operation of DWD to store more organized details or 
middle-level metrics.
 
 ## Use Cases
diff --git a/docs/DataModels/RawLayerSchema.md 
b/docs/DataModels/RawLayerSchema.md
new file mode 100644
index 000000000..07b094dea
--- /dev/null
+++ b/docs/DataModels/RawLayerSchema.md
@@ -0,0 +1,29 @@
+---
+title: "Raw Layer Schema"
+description: >
+   Caches raw API responses from data source plugins
+sidebar_position: 3
+---
+
+## Summary
+
+This document describes Apache DevLake's raw layer schema.
+
+Referring to DevLake's [architecture](../Overview/Architecture.md), the raw 
layer stores the API responses from data sources (DevOps tools) in JSON. This 
saves developers' time if the raw data is to be transformed differently later 
on. Please note that communicating with data sources' APIs is usually the most 
time-consuming step.
+
+
+## Use Cases
+
+1. As a user, you can check raw data tables to verify data quality if you have 
concerns about the [domain layer data](DevLakeDomainLayerSchema.md).
+2. As a developer, you can customize domain layer schema based on raw data 
tables via [customize](Plugins/customize.md).
+
+
+## Data Models
+
+Raw layer tables start with a prefix `_raw_`. Each plugin contains multiple 
raw data tables, the naming convension of these tables is 
`_raw_{plugin}_{entity}`. For instance,
+- _raw_jira_issues
+- _raw_jira_boards
+- _raw_jira_board_issues
+- ...
+
+Normally, you do not need to use these tables, unless you have use cases above.
diff --git a/docs/DataModels/SystemTables.md b/docs/DataModels/SystemTables.md
new file mode 100644
index 000000000..9b128e42f
--- /dev/null
+++ b/docs/DataModels/SystemTables.md
@@ -0,0 +1,28 @@
+---
+title: "System Tables"
+description: >
+   Stores DevLake's own entities
+sidebar_position: 4
+---
+
+## Summary
+
+This document describes Apache DevLake's data models of its own entities.
+
+
+## Use Cases
+
+1. As a user, you can check `_devlake_blueprints` and `_devlake_pipelines` 
when failing to collect data via DevLake's blueprint.
+2. As a contributor, you can check these tables to debug task concurrency or 
data migration features.
+
+
+## Data Models
+
+These tables start with a prefix `_devlake`. Unlike raw or tool data tables, 
DevLake only contains one set of system tables. The naming convension of these 
tables is `_raw_{plugin}_{entity}`, such as 
+- _devlake_blueprints
+- _devlake_pipelines
+- _devlake_tasks
+- _devlake_subtasks
+- ...
+
+Normally, you do not need to use these tables, unless you have use cases above.
diff --git a/docs/DataModels/ToolLayerSchema.md 
b/docs/DataModels/ToolLayerSchema.md
new file mode 100644
index 000000000..17c442502
--- /dev/null
+++ b/docs/DataModels/ToolLayerSchema.md
@@ -0,0 +1,28 @@
+---
+title: "Tool Layer Schema"
+description: >
+   Extract raw data into a relational schema for each specific tool
+sidebar_position: 2
+---
+
+## Summary
+
+This document describes Apache DevLake's tool layer schema.
+
+Referring to DevLake's [architecture](../Overview/Architecture.md), the Tool 
layer extracts raw data from JSONs into a relational schema that's easier to 
consume by analytical tasks. Each DevOps tool would have a schema that's 
tailored to their data structure, hence the name, the Tool layer.
+
+
+## Use Cases
+
+As a user, you can check tool data tables to verify data quality if you have 
concerns about the [domain layer data](DevLakeDomainLayerSchema.md).
+
+
+## Data Models
+
+Tool layer tables start with a prefix `_tool_`. Each plugin contains multiple 
tool data tables, the naming convension of these tables is 
`_raw_{plugin}_{entity}`. For instance,
+- _tool_jira_issues
+- _tool_jira_boards
+- _tool_jira_board_issues`
+- ...
+
+Normally, you do not need to use tool layer tables, unless you have use cases 
above.

[incubator-devlake-website] 01/02: docs: add data layer schema

Reply via email to