[jira] [Work started] (GRIFFIN-71) Failure to submit multiple timing tasks with Livy

2017-11-30 Thread William Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/GRIFFIN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on GRIFFIN-71 started by William Guo.
--
> Failure to submit multiple timing tasks with Livy
> -
>
> Key: GRIFFIN-71
> URL: https://issues.apache.org/jira/browse/GRIFFIN-71
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Reporter: Yang
>Assignee: William Guo
> Attachments: hs_err_pid56468.log, spark10err.txt
>
>
> In the client server, using the Livy timer submit 100 monitoring rules, found 
> that Livy will be submitted to the spark cluster on the part of the 
> monitoring task and other tasks fail .We view the Livy log, JVM can't start 
> up, there may be insufficient memory, client log specific memory 32G, please 
> see the attachment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GRIFFIN-71) Failure to submit multiple timing tasks with Livy

2017-11-30 Thread William Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/GRIFFIN-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273853#comment-16273853
 ] 

William Guo commented on GRIFFIN-71:


Fixed?

> Failure to submit multiple timing tasks with Livy
> -
>
> Key: GRIFFIN-71
> URL: https://issues.apache.org/jira/browse/GRIFFIN-71
> Project: Griffin (Incubating)
>  Issue Type: Bug
>Reporter: Yang
>Assignee: William Guo
> Attachments: hs_err_pid56468.log, spark10err.txt
>
>
> In the client server, using the Livy timer submit 100 monitoring rules, found 
> that Livy will be submitted to the spark cluster on the part of the 
> monitoring task and other tasks fail .We view the Livy log, JVM can't start 
> up, there may be insufficient memory, client log specific memory 32G, please 
> see the attachment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Re: Re: Re: Re: Griffin Accuracy Doubts

2017-11-30 Thread Ananthanarayanan Ms
Hi Lionel,
Thank you and the griffin-dsl works for this usecase and yes, awaiting for
the next release to capture other usecase as one of our usecase is to use
more than 1 table joined by a sql and use it as src vs again multi sliced
table joined as tgt sql and do a accuracy check.


Regards,
Ananthanarayanan.M.S

On Thu, Nov 30, 2017 at 7:42 PM, Lionel Liu  wrote:

> Hi Ananthanarayanan.M.S,
>   Thanks for your reply. I think you're right, in this version, the
> griffin-dsl can deal with accuracy and profiling cases, but for most users
> they prefer sql.
>   Actually, in this version, considering of streaming mode, we made some
> specific process for spark sql, it could deal with some profiling cases,
> but fails in other ones. We'll fix this bug, and try to keep sql directly
> for the measure engine, you'll be able to use sql in next version.
>
>
>
> --
> Regards,
> Lionel, Liu
>
> At 2017-11-29 12:07:39, "Ananthanarayanan Ms"  nielsen.com> wrote:
>
> Hi Lionel,
>  Sure, will try griffin-dsl and update here. I was willing to keep a
> distance from this feature griffin-dsl as it needed a learning curve on how
> to use the expression and attributes and we are planning for our end users
> who would not have this background and if not then we need a intermediate
> step to translate their need to this dsl unlike sql where there are
> proficient already.
>
> Thank you.
>
>
> Regards,
> Ananthanarayanan.M.S
>
> On Tue, Nov 28, 2017 at 8:38 AM, Lionel Liu  wrote:
>
>> Hi Ananthanarayanan.M.S,
>> Sorry for the late reply, I think you are trying accuracy measure by
>> configuring the field "dsl.type" as "spark-sql", not as "griffin-dsl",
>> right?
>> If you just want to use accuracy, you'd better try "griffin-dsl", and
>> configure "dq.type" as "accuracy", with "rule" like this "source.name =
>> target.name and source.age = target.age".
>> For "spark-sql" type, we add a column "__tmst" into the sql, and group by
>> it, to fit some streaming cases, but it does need some background
>> knowledge, for example, like accuracy case, when there're multiple data
>> sources, sql engine can not recognize the column "__tmst" belongs to source
>> or target, so it is ambiguous, we'll fix this bug in later version.
>> After all, for accuracy cases, we recommend you use the configuration of
>> rule like this:
>> {
>>   "dsl.type": "griffin-dsl",
>>   "dq.type": "accuracy",
>>   "rule": "source.name = target.name and source.age = target.age"
>> }
>>
>>
>> Hope this can help you, thanks a lot.
>>
>>
>>
>> --
>> Regards,
>> Lionel, Liu
>>
>> At 2017-11-21 20:42:45, "Ananthanarayanan Ms" <
>> ananthanarayanan.ms...@nielsen.com> wrote:
>>
>> Hi Lionel,
>>  Even after changing, the issue still remains and looks like during join
>> as both src and tgt are already populated withColumn *__tmst *hence here
>> it shows* '__tmst' is ambiguous. *Could you please help to check and
>> clarify.
>>
>> 17/11/21 08:38:46 ERROR engine.SparkSqlEngine: run spark sql [ SELECT
>> `__tmst` AS `__tmst`, source.product_id, source.market_id,
>> source.period_id,source.s5_volume_rom_distrib FROM source LEFT JOIN
>> target *ON *coalesce(source.period_id, 'null') =
>> coalesce(target.period_id, 'null') and coalesce(source.market_id, 'null') =
>> coalesce(target.market_id, 'null') and coalesce(source.product_id, 'null')
>> = coalesce(target.product_id, 'null') *WHERE *(NOT (source.product_id IS
>> NULL AND source.market_id IS NULL AND source. period_id IS NULL)) AND
>> target.product_id IS NULL AND target.market_id IS NULL AND target.period_id
>> IS NULL GROUP BY `__tmst` ] error: Reference '__tmst' is ambiguous, could
>> be: __tmst#145L, __tmst#146L.; line 1 pos 566
>>
>>
>>
>> Regards,
>> Ananthanarayanan.M.S
>>
>> On Tue, Nov 21, 2017 at 8:05 AM, Lionel Liu  wrote:
>>
>>> Hi Ananthanarayanan.M.S,
>>>
>>> Actually, we add a new column `__tmst` for each table, to persist the
>>> timestamp of each data row.
>>> But the sql you list seems not like our exact logic, so I want to see
>>> your accuracy rule statement to have a check.
>>>
>>> if your accuracy rule statement is like:
>>> "source.product_id = target.product_id, source.market_id =
>>> target.market_id, source.period_id = target.period_id",
>>> the sql for that step should be:
>>> "SELECT source.* FROM source LEFT JOIN target ON coalesce(source.
>>> product_id, 'null') = coalesce(target. product_id, 'null') and
>>> coalesce(source.market_id, 'null') = coalesce(target.market_id, 'null') and
>>> coalesce(source. period_id, 'null') = coalesce(target. period_id,
>>> 'null') WHERE (NOT (source.product_id IS NULL AND source.market_id IS
>>> NULL AND source. period_id IS NULL)) AND target.product_id IS NULL AND
>>> source.market_id IS NULL AND source. period_id IS NULL".
>>>
>>> Hope this can help you, thanks.
>>>
>>>
>>> --
>>> Regards,
>>> Lionel, Liu
>>>
>>>
>>> At 2017-11-21 06:50:09, "William Guo"  wrote:
>>> >hello Ananthanarayanan.M.S,
>>> >
>>> >We support some partitioned tables, could you show us your

Re:looking for help

2017-11-30 Thread Lionel Liu
Hi 记史,

In our solution this version, griffin persist the metrics in 
elasticsearch, which means you need an extra es server, you can get an official 
docker of es to try.

In ES_SERVER = "http://:9200",  means the ip 
address of es server. And you also need to configure the es server address to 
persist metrics calculated by griffin in env.json, which is one of the measure 
module input config file.

 

Thanks

Lionel, Liu





--

Regards,
Lionel, Liu



在 2017-11-29 16:14:14,"记 史"  写道:
>I try to deploy Griffin in CentOS 7 in VM.
>Before I doing this , CentOS 7 allready install CDH 5.7.6
>I follow the README.md ,confused with :
>ui/js/services/services.js
>ES_SERVER = "http://:9200"
>
>Could you give me some advice
>
>发送自 Windows 10 版邮件应用
>


Re:Re: Re: Re: Re: Griffin Accuracy Doubts

2017-11-30 Thread Lionel Liu
Hi Ananthanarayanan.M.S,
  Thanks for your reply. I think you're right, in this version, the griffin-dsl 
can deal with accuracy and profiling cases, but for most users they prefer sql. 
  Actually, in this version, considering of streaming mode, we made some 
specific process for spark sql, it could deal with some profiling cases, but 
fails in other ones. We'll fix this bug, and try to keep sql directly for the 
measure engine, you'll be able to use sql in next version.




--

Regards,
Lionel, Liu

At 2017-11-29 12:07:39, "Ananthanarayanan Ms" 
 wrote:

Hi Lionel,
 Sure, will try griffin-dsl and update here. I was willing to keep a distance 
from this feature griffin-dsl as it needed a learning curve on how to use the 
expression and attributes and we are planning for our end users who would not 
have this background and if not then we need a intermediate step to translate 
their need to this dsl unlike sql where there are proficient already.


Thank you.




Regards,
Ananthanarayanan.M.S


On Tue, Nov 28, 2017 at 8:38 AM, Lionel Liu  wrote:

Hi Ananthanarayanan.M.S,
Sorry for the late reply, I think you are trying accuracy measure by 
configuring the field "dsl.type" as "spark-sql", not as "griffin-dsl", right?
If you just want to use accuracy, you'd better try "griffin-dsl", and configure 
"dq.type" as "accuracy", with "rule" like this "source.name = target.name and 
source.age = target.age".
For "spark-sql" type, we add a column "__tmst" into the sql, and group by it, 
to fit some streaming cases, but it does need some background knowledge, for 
example, like accuracy case, when there're multiple data sources, sql engine 
can not recognize the column "__tmst" belongs to source or target, so it is 
ambiguous, we'll fix this bug in later version.
After all, for accuracy cases, we recommend you use the configuration of rule 
like this:
{
  "dsl.type": "griffin-dsl",
  "dq.type": "accuracy",
  "rule": "source.name = target.name and source.age = target.age"
}




Hope this can help you, thanks a lot.




--

Regards,
Lionel, Liu

At 2017-11-21 20:42:45, "Ananthanarayanan Ms" 
 wrote:

Hi Lionel,
 Even after changing, the issue still remains and looks like during join as 
both src and tgt are already populated withColumn __tmst hence here it shows 
'__tmst' is ambiguous. Could you please help to check and clarify.


17/11/21 08:38:46 ERROR engine.SparkSqlEngine: run spark sql [ SELECT `__tmst` 
AS `__tmst`, source.product_id, source.market_id, 
source.period_id,source.s5_volume_rom_distrib FROM source LEFT JOIN target ON 
coalesce(source.period_id, 'null') = coalesce(target.period_id, 'null') and 
coalesce(source.market_id, 'null') = coalesce(target.market_id, 'null') and 
coalesce(source.product_id, 'null') = coalesce(target.product_id, 'null') WHERE 
(NOT (source.product_id IS NULL AND source.market_id IS NULL AND source. 
period_id IS NULL)) AND target.product_id IS NULL AND target.market_id IS NULL 
AND target.period_id IS NULL GROUP BY `__tmst` ] error: Reference '__tmst' is 
ambiguous, could be: __tmst#145L, __tmst#146L.; line 1 pos 566
 




Regards,
Ananthanarayanan.M.S


On Tue, Nov 21, 2017 at 8:05 AM, Lionel Liu  wrote:

Hi Ananthanarayanan.M.S,


Actually, we add a new column `__tmst` for each table, to persist the timestamp 
of each data row.
But the sql you list seems not like our exact logic, so I want to see your 
accuracy rule statement to have a check.


if your accuracy rule statement is like:
"source.product_id = target.product_id, source.market_id = target.market_id, 
source.period_id = target.period_id",
the sql for that step should be:
"SELECT source.* FROM source LEFT JOIN target ON coalesce(source. product_id, 
'null') = coalesce(target. product_id, 'null') and coalesce(source.market_id, 
'null') = coalesce(target.market_id, 'null') and coalesce(source. period_id, 
'null') = coalesce(target. period_id, 'null') WHERE (NOT (source.product_id IS 
NULL AND source.market_id IS NULL AND source. period_id IS NULL)) AND 
target.product_id IS NULL AND source.market_id IS NULL AND source. period_id IS 
NULL".


Hope this can help you, thanks.



--

Regards,
Lionel, Liu



At 2017-11-21 06:50:09, "William Guo"  wrote:
>hello Ananthanarayanan.M.S,
>
>We support some partitioned tables, could you show us your partitioned table 
>description for check and some sample data will be helpful?
>
>Thanks,
>William
>
>From: Ananthanarayanan Ms 
>Sent: Tuesday, November 21, 2017 3:53 AM
>To: William Guo
>Cc: dev@griffin.incubator.apache.org; Abishek Kunduru
>Subject: Re: Re: Griffin Accuracy Doubts
>
>Hello William / Lionel,
>Could you please let us know if we can use 1.6. Could you please let know if 
>the partitioned tables alone can be input as src/tgt as below err is got when 
>we build and run the master code.
>
>17/11/20 14:51:13 ERROR engine.SparkSqlEngine: run spark sql [ SELECT `__tmst` 
>AS `__tmst`, source.product_id, source.market_id, 
>source.period_id,s5_vo

[ANNOUNCE] Apache Griffin-0.1.6-incubating released

2017-11-30 Thread William Guo
Hi all,


The Apache Griffin (incubating) team is pleased to announce the
release of Griffin 0.1.6-incubating.


Apache Griffin is data quality solution for modern data system,
it defines a standard process to define, measure data quality for
well-known dimensions.


The release is available at:
https://www.apache.org/dyn/closer.cgi/incubator/griffin


Thanks,


The Apache Griffin (incubating) team




=
*DISCLAIMER*
Apache Griffin is an effort undergoing incubation at the Apache
Software Foundation (ASF), sponsored by the Apache Incubator.


Incubation is required of all newly accepted projects until a further
review indicates that the infrastructure, communications, and decision
making process have stabilized in a manner consistent with other
successful ASF projects.


While incubation status is not necessarily a reflection of the
completeness or stability of the code, it does indicate that the
project has yet to be fully endorsed by the ASF.


[jira] [Updated] (GRIFFIN-73) compile is blocked

2017-11-30 Thread wanyin (JIRA)

 [ 
https://issues.apache.org/jira/browse/GRIFFIN-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanyin updated GRIFFIN-73:
--
Description: 
Hi guys,
now i encounter a issue as below in compile step, could you help to take a look?

[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ griffin ---
[INFO] Installing /Users/yiwan/work/incubator-griffin/pom.xml to 
/Users/yiwan/.m2/raptor2/org/apache/griffin/griffin/0.1.7-incubating-SNAPSHOT/griffin-0.1.7-incubating-SNAPSHOT.pom
[INFO]
[INFO] 
[INFO] Building Apache Griffin :: UI :: Default UI 0.1.7-incubating-SNAPSHOT
[INFO] 
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ ui ---
[INFO] Deleting /Users/yiwan/work/incubator-griffin/ui/target
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ ui ---
[INFO]
[INFO] --- frontend-maven-plugin:1.6:install-node-and-npm (install node and 
npm) @ ui ---
[INFO] Node v6.11.3 is already installed.
[INFO] Installing npm version 3.10.10
[INFO] Downloading https://registry.npmjs.org/npm/-/npm-3.10.10.tgz to 
/Users/yiwan/.m2/raptor2/com/github/eirslett/npm/3.10.10/npm-3.10.10.tar.gz
[INFO] No proxies configured
[INFO] No proxy was configured, downloading directly

thanks,
evan



  was:
Hi guys,
now i am clone the griffin code but encounter the issue as below in compile 
step, could you help to take a look?

[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ griffin ---
[INFO] Installing /Users/yiwan/work/incubator-griffin/pom.xml to 
/Users/yiwan/.m2/raptor2/org/apache/griffin/griffin/0.1.7-incubating-SNAPSHOT/griffin-0.1.7-incubating-SNAPSHOT.pom
[INFO]
[INFO] 
[INFO] Building Apache Griffin :: UI :: Default UI 0.1.7-incubating-SNAPSHOT
[INFO] 
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ ui ---
[INFO] Deleting /Users/yiwan/work/incubator-griffin/ui/target
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ ui ---
[INFO]
[INFO] --- frontend-maven-plugin:1.6:install-node-and-npm (install node and 
npm) @ ui ---
[INFO] Node v6.11.3 is already installed.
[INFO] Installing npm version 3.10.10
[INFO] Downloading https://registry.npmjs.org/npm/-/npm-3.10.10.tgz to 
/Users/yiwan/.m2/raptor2/com/github/eirslett/npm/3.10.10/npm-3.10.10.tar.gz
[INFO] No proxies configured
[INFO] No proxy was configured, downloading directly

thanks,
evan




> compile is blocked 
> ---
>
> Key: GRIFFIN-73
> URL: https://issues.apache.org/jira/browse/GRIFFIN-73
> Project: Griffin (Incubating)
>  Issue Type: Improvement
>Reporter: wanyin
>
> Hi guys,
> now i encounter a issue as below in compile step, could you help to take a 
> look?
> [INFO] --- maven-install-plugin:2.5.2:install (default-install) @ griffin ---
> [INFO] Installing /Users/yiwan/work/incubator-griffin/pom.xml to 
> /Users/yiwan/.m2/raptor2/org/apache/griffin/griffin/0.1.7-incubating-SNAPSHOT/griffin-0.1.7-incubating-SNAPSHOT.pom
> [INFO]
> [INFO] 
> 
> [INFO] Building Apache Griffin :: UI :: Default UI 0.1.7-incubating-SNAPSHOT
> [INFO] 
> 
> [INFO]
> [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ ui ---
> [INFO] Deleting /Users/yiwan/work/incubator-griffin/ui/target
> [INFO]
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ ui ---
> [INFO]
> [INFO] --- frontend-maven-plugin:1.6:install-node-and-npm (install node and 
> npm) @ ui ---
> [INFO] Node v6.11.3 is already installed.
> [INFO] Installing npm version 3.10.10
> [INFO] Downloading https://registry.npmjs.org/npm/-/npm-3.10.10.tgz to 
> /Users/yiwan/.m2/raptor2/com/github/eirslett/npm/3.10.10/npm-3.10.10.tar.gz
> [INFO] No proxies configured
> [INFO] No proxy was configured, downloading directly
> thanks,
> evan



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (GRIFFIN-73) compile is blocked

2017-11-30 Thread wanyin (JIRA)
wanyin created GRIFFIN-73:
-

 Summary: compile is blocked 
 Key: GRIFFIN-73
 URL: https://issues.apache.org/jira/browse/GRIFFIN-73
 Project: Griffin (Incubating)
  Issue Type: Improvement
Reporter: wanyin


Hi guys,
now i am clone the griffin code but encounter the issue as below in compile 
step, could you help to take a look?

[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ griffin ---
[INFO] Installing /Users/yiwan/work/incubator-griffin/pom.xml to 
/Users/yiwan/.m2/raptor2/org/apache/griffin/griffin/0.1.7-incubating-SNAPSHOT/griffin-0.1.7-incubating-SNAPSHOT.pom
[INFO]
[INFO] 
[INFO] Building Apache Griffin :: UI :: Default UI 0.1.7-incubating-SNAPSHOT
[INFO] 
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ ui ---
[INFO] Deleting /Users/yiwan/work/incubator-griffin/ui/target
[INFO]
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ ui ---
[INFO]
[INFO] --- frontend-maven-plugin:1.6:install-node-and-npm (install node and 
npm) @ ui ---
[INFO] Node v6.11.3 is already installed.
[INFO] Installing npm version 3.10.10
[INFO] Downloading https://registry.npmjs.org/npm/-/npm-3.10.10.tgz to 
/Users/yiwan/.m2/raptor2/com/github/eirslett/npm/3.10.10/npm-3.10.10.tar.gz
[INFO] No proxies configured
[INFO] No proxy was configured, downloading directly

thanks,
evan





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)