[
https://issues.apache.org/jira/browse/GRIFFIN-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637381#comment-16637381
]
Cory Woytasik commented on GRIFFIN-190:
---
Ok here is what i have done based on some info that I found at:
https://griffin.incubator.apache.org/docs/profiling.html
1. In the /home/server/griffin-0.2.0-incubating/measure/target/classes/env.json
- I set the file to:
{{{ "spark": \{ "log.level": "WARN" }, "sinks": [ \{ "type": "console" }, \{
"type": "hdfs", "config": { "path": "hdfs:///griffin/persist" } }, \{ "type":
"elasticsearch", "config": { "method": "post", "api":
"http://es:9200/griffin/accuracy; } } ] }}}
{{2. I created
/home/server/griffin-0.2.0-incubating/measure/target/classes/dq.json - I set
the file to based on my table (lineageload in hive):}}
{{{}}
{{"name": "batch_prof",}}
{{ "process.type": "batch",}}
{{ "data.sources": [}}
{{ {}}
{{ "name": "src",}}
{{ "baseline": true,}}
{{ "connectors": [}}
{{ {}}
{{ "type": "hive",}}
{{ "version": "3.1",}}
{{ "config": {}}
{{ "database": "default",}}
{{ "table.name": "lineageload"}}
{{ }}}
{{ }}}
{{ ]}}
{{ }}}
{{ ],}}
{{ "evaluate.rule": {}}
{{ "rules": [}}
{{ {}}
{{ "dsl.type": "griffin-dsl",}}
{{ "dq.type": "profiling",}}
{{ "out.dataframe.name": "prof",}}
{{ "rule": "src.asset.count() AS asset_count, src.asset.length().max() AS
asset_length_max",}}
{{ "out": [}}
{{ {}}
{{ "type": "metric",}}
{{ "name": "prof"}}
{{ }}}
{{ ]}}
{{ }}}
{{ ]}}
{{ },}}
{{ "sinks": ["CONSOLE", "HDFS"]}}
{{}}}
3. I then ran the following command:
./spark-submit --class org.apache.griffin.measure.Application --master yarn
--deploy-mode client --queue default \
--driver-memory 1g --executor-memory 1g --num-executors 2 \
/home/server/griffin-0.2.0-incubating/measure/target/griffin-measure.jar \
/home/server/griffin-0.2.0-incubating/measure/target/classes/env.json
/home/server/griffin-0.2.0-incubating/measure/target/classes/dq.json
4. I then looked at unit-tests.log in
/home/server/spark-2.3.1-bin-hadoop2.7/bin/target and noticed the following
message:
18/10/03 13:38:44.721 main WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
18/10/03 13:38:44.815 main INFO Application$: [Ljava.lang.String;@7c214cc0
18/10/03 13:38:44.815 main INFO Application$:
/home/server/griffin-0.2.0-incubating/measure/target/classes/env.json
18/10/03 13:38:44.815 main INFO Application$:
/home/server/griffin-0.2.0-incubating/measure/target/classes/dq.json
18/10/03 13:38:45.577 main INFO Application$: params validation pass
18/10/03 13:38:45.599 main ERROR Application$: process init error: null
18/10/03 13:38:45.610 Thread-1 INFO ShutdownHookManager: Shutdown hook called
18/10/03 13:38:45.610 Thread-1 INFO ShutdownHookManager: Deleting directory
/tmp/spark-ea6a6ad3-e533-4a4d-a33e-55b0c35a8352
What am I doing wrong? Or what are we missing? Thanks Lionel
> Blank Health and DQ Metrics Screen
> --
>
> Key: GRIFFIN-190
> URL: https://issues.apache.org/jira/browse/GRIFFIN-190
> Project: Griffin (Incubating)
> Issue Type: Bug
>Affects Versions: 0.2.0-incubating
>Reporter: Cory Woytasik
>Priority: Major
> Attachments: PLDataLineageLoad061818.csv
>
>
> Griffin is up and running. We have both an accuracy measure and a profiling
> measure that is set to run every minute via jobs. When we click the chart
> icon next to the job we receive a "no content" message. When we click on the
> Health link or DQ Metrics link they think for a second and then display a
> blank screen. We are thinking this might be ES related, but aren't
> completely sure. Need some help. We assume it's a path or property setup
> issue. Here are the versions we are running:
> Hive - 3.1.0
> Elasticsearch - 5.3.1
> griffin - 0.2.0
> hadoop - 3.1.1
> livy - 0.3.0
> spark - 2.3.1
> Using postgres too
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)