incubator-griffin-site git commit: Updated asf-site site from master (c1aa249063cc79a06f9eeb15c6435e90b65538c8)
Repository: incubator-griffin-site Updated Branches: refs/heads/asf-site 5558bdcb3 -> 8f47d3635 Updated asf-site site from master (c1aa249063cc79a06f9eeb15c6435e90b65538c8) Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/commit/8f47d363 Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/tree/8f47d363 Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/diff/8f47d363 Branch: refs/heads/asf-site Commit: 8f47d363523456a8e23697aabe5a7d5ce2a7954c Parents: 5558bdc Author: William Guo Authored: Wed Sep 19 08:52:38 2018 +0800 Committer: William Guo Committed: Wed Sep 19 08:52:38 2018 +0800 -- index.html | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/8f47d363/index.html -- diff --git a/index.html b/index.html index e9c976c..7da39f7 100755 --- a/index.html +++ b/index.html @@ -231,7 +231,7 @@ COMMUNITY -Contribution +Contribution Get help using Griffin or contribute to the project @@ -252,7 +252,7 @@ -Events +Events Learn more about Griffin from Conferences @@ -262,7 +262,7 @@ -Apache Software Foundation +Apache Software Foundation
incubator-griffin-site git commit: fix css style
Repository: incubator-griffin-site Updated Branches: refs/heads/master 077729686 -> c1aa24906 fix css style Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/commit/c1aa2490 Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/tree/c1aa2490 Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/diff/c1aa2490 Branch: refs/heads/master Commit: c1aa249063cc79a06f9eeb15c6435e90b65538c8 Parents: 0777296 Author: William Guo Authored: Wed Sep 19 08:51:32 2018 +0800 Committer: William Guo Committed: Wed Sep 19 08:51:32 2018 +0800 -- index.html | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/c1aa2490/index.html -- diff --git a/index.html b/index.html index e9c976c..7da39f7 100755 --- a/index.html +++ b/index.html @@ -231,7 +231,7 @@ COMMUNITY -Contribution +Contribution Get help using Griffin or contribute to the project @@ -252,7 +252,7 @@ -Events +Events Learn more about Griffin from Conferences @@ -262,7 +262,7 @@ -Apache Software Foundation +Apache Software Foundation
incubator-griffin-site git commit: Updated asf-site site from master (0777296868773f3456019df24829827a90b46fde)
Repository: incubator-griffin-site Updated Branches: refs/heads/asf-site 6eaa9c6f0 -> 5558bdcb3 Updated asf-site site from master (0777296868773f3456019df24829827a90b46fde) Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/commit/5558bdcb Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/tree/5558bdcb Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/diff/5558bdcb Branch: refs/heads/asf-site Commit: 5558bdcb38389a82d8adc73f8f7d0a55a82e3e48 Parents: 6eaa9c6 Author: William Guo Authored: Tue Sep 18 15:56:32 2018 +0800 Committer: William Guo Committed: Tue Sep 18 15:56:32 2018 +0800 -- docs/profiling.html | 162 ++- 1 file changed, 161 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/5558bdcb/docs/profiling.html -- diff --git a/docs/profiling.html b/docs/profiling.html index 273452f..1826c21 100644 --- a/docs/profiling.html +++ b/docs/profiling.html @@ -130,7 +130,167 @@ under the License. Profiling Use Case - +User Story +Say we have one data set(demo_src), partitioned by hour, we want to know what is the data like for each hour. + +For simplicity, suppose both two data set have the same schema as this: +id bigint +age int +descstring +dt string +hourstring + +both dt and hour are partitions, + +as every day we have one daily partition dt(like 20180912), + +for every day we have 24 hourly partitions(like 00, 01, 02, â¦, 23). + +Environment Preparation +You need to prepare the environment for Apache Griffin measure module, including the following software: + + JDK (1.8+) + Hadoop (2.6.0+) + Spark (2.2.1+) + Hive (2.2.0) + + +Build Griffin Measure Module + + Download Griffin source package https://www.apache.org/dist/incubator/griffin/0.3.0-incubating;>here. + Unzip the source package. +unzip griffin-0.3.0-incubating-source-release.zip +cd griffin-0.3.0-incubating-source-release + + + Build Griffin jars. +mvn clean install + + +Move the built griffin measure jar to your work path. + +mv measure/target/measure-0.3.0-incubating.jar work path/griffin-measure.jar + + + + +Data Preparation + +For our quick start, We will generate a hive table demo_src. +--create hive tables here. hql script +--Note: replace hdfs location with your own path +CREATE EXTERNAL TABLE `demo_src`( + `id` bigint, + `age` int, + `desc` string) +PARTITIONED BY ( + `dt` string, + `hour` string) +ROW FORMAT DELIMITED + FIELDS TERMINATED BY '|' +LOCATION + 'hdfs:///griffin/data/batch/demo_src'; + +The data could be generated this: +1|18|student +2|23|engineer +3|42|cook +... + +You can download demo data and execute ./gen_demo_data.sh to get the data source file. +Then we will load data into hive table for every hour. +LOAD DATA LOCAL INPATH 'demo_src' INTO TABLE demo_src PARTITION (dt='20180912',hour='09'); + +Or you can just execute ./gen-hive-data.sh in the downloaded directory above, to generate and load data into the tables hourly. + +Define data quality measure + +Griffin env configuration +The environment config file: env.json +{ + "spark": { +"log.level": "WARN" + }, + "sinks": [ +{ + "type": "console" +}, +{ + "type": "hdfs", + "config": { +"path": "hdfs:///griffin/persist" + } +}, +{ + "type": "elasticsearch", + "config": { +"method": "post", +"api": "http://es:9200/griffin/accuracy; + } +} + ] +} + + +Define griffin data quality +The DQ config file: dq.json + +{ + "name": "batch_prof", + "process.type": "batch", + "data.sources": [ +{ + "name": "src", + "baseline": true, + "connectors": [ +{ + "type": "hive", + "version": "1.2", + "config": { +"database": "default", +"table.name": "demo_tgt" + } +} + ] +} + ], + "evaluate.rule": { +"rules": [ + { +"dsl.type": "griffin-dsl", +"dq.type": "profiling", +"out.dataframe.name": "prof", +"rule": "src.id.count() AS id_count, src.age.max() AS age_max, src.desc.length().max() AS desc_length_max", +"out": [ + { +"type": "metric", +"name": "prof" + } +] + } +] + }, + "sinks": ["CONSOLE", "HDFS"] +} + +
[2/2] incubator-griffin-site git commit: Merge branch 'profiling' of https://github.com/bhlx3lyx7/incubator-griffin-site
Merge branch 'profiling' of https://github.com/bhlx3lyx7/incubator-griffin-site Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/commit/07772968 Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/tree/07772968 Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/diff/07772968 Branch: refs/heads/master Commit: 0777296868773f3456019df24829827a90b46fde Parents: 3891ad0 ce45b1d Author: William Guo Authored: Tue Sep 18 15:54:30 2018 +0800 Committer: William Guo Committed: Tue Sep 18 15:54:30 2018 +0800 -- profiling.md | 165 ++ 1 file changed, 165 insertions(+) --
incubator-griffin-site git commit: add navigation
Repository: incubator-griffin-site Updated Branches: refs/heads/master 78070b850 -> 3891ad018 add navigation Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/commit/3891ad01 Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/tree/3891ad01 Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/diff/3891ad01 Branch: refs/heads/master Commit: 3891ad018ba16a3cfd221bcc76ea3e0035c2faa6 Parents: 78070b8 Author: William Guo Authored: Tue Sep 18 15:44:28 2018 +0800 Committer: William Guo Committed: Tue Sep 18 15:44:28 2018 +0800 -- _config.yml | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/incubator-griffin-site/blob/3891ad01/_config.yml -- diff --git a/_config.yml b/_config.yml index f525d51..0fb297f 100644 --- a/_config.yml +++ b/_config.yml @@ -45,6 +45,8 @@ documentations: url: /docs/conf.html - category: Development links: + - title: Contribution +url: /docs/contribute.html - title: Latest version (v0.3.0) url: /docs/latest.html - category: Download