RE: Analyzing past batch data periodically

Vikram Jain Thu, 17 Jan 2019 22:52:13 -0800

Hi Eugene,
Thanks for your response. I guess I did not make myself clear. I am able to 
create and run jobs; the issue is not there. I have past 5 months data in hive 
table(partitioned by day). Now, Griffin by defaults aggregates the entire data 
and runs the job on aggregated dataset. I want to process aggregated weekly 
data starting from week 1, then week 2, week 3...and so on for 5 months. I am 
unable to find a way to do. Does Griffin has this capability at present? Or do 
I need to create custom rules using Spark-SQL mentioning the date range and 
submit them directly using API/Livy?
Any help is much appreciated.

Regards,
Vikram

From: Eugene Liu <[email protected]>
Sent: Friday, January 18, 2019 7:10 AM
To: Vikram Jain <[email protected]>; [email protected]
Subject: Re: Analyzing past batch data periodically

Hi Vikram

if you hope to create a profiling job, there are user/api guides which can help 
you
https://github.com/apache/griffin/blob/master/griffin-doc/service/api-guide.md
[Image removed by 
sender.]<https://github.com/apache/griffin/blob/master/griffin-doc/service/api-guide.md>

apache/griffin<https://github.com/apache/griffin/blob/master/griffin-doc/service/api-guide.md>
Mirror of Apache griffin . Contribute to apache/griffin development by creating 
an account on GitHub.
github.com

https://github.com/apache/griffin/blob/master/griffin-doc/ui/user-guide.md
[Image removed by 
sender.]<https://github.com/apache/griffin/blob/master/griffin-doc/ui/user-guide.md>

griffin/user-guide.md at master * apache/griffin * 
GitHub<https://github.com/apache/griffin/blob/master/griffin-doc/ui/user-guide.md>
Apache Griffin is an open source Data Quality solution for distributed data 
systems at any scale in both streaming or batch data context. Users will 
primarily access this application from a PC. if you want to measure the match 
rate between source and target, choose accuracy. if you want to check the ...
github.com

thx
Eugene

________________________________
From: Vikram Jain <[email protected]>
Sent: Friday, January 18, 2019 3:41 AM
To: [email protected]
Subject: Analyzing past batch data periodically

Hi,
I have 5 months data in my hive table which is partitioned day wise. I want to 
run a profiling job on this data and analyze the result and trend on weekly 
data i.e. I want to create the job that on each execution, processes 1 week of 
data incrementally and stores the metrics weekwise. I could not find a way to 
do this in Griffin. Can someone please help me with the solution if it exists.

Regards,
Vikram

RE: Analyzing past batch data periodically

Reply via email to