hi Enrico,

https://issues.apache.org/jira/browse/GRIFFIN-165

A ticket created to follow this issue.

Thanks,
William

On Tue, May 15, 2018 at 6:58 AM, William Guo <gu...@apache.org> wrote:

> yes,
>
> Griffin team will make a doc about contributing points(interfaces) for
> measures.
>
> Will let you know when it is ready.
>
>
>
> Thanks,
> William
>
> On Mon, May 14, 2018 at 5:13 PM, Enrico D'Urso <a-edu...@hotels.com>
> wrote:
>
>> Hi,
>>
>> Yes, it sounds a very good idea. I am pretty interested in the topic.
>> Is there an ongoing discussion that I can start to look at?
>>
>> Thanks,
>>
>> Enrico
>>
>> On 5/13/18, 2:58 AM, "William Guo" <gu...@apache.org> wrote:
>>
>>     hi Enrico,
>>
>>     Yes, since we have released 0.2.0 recently.
>>
>>     Our next plan will include enhance measures, including support anomaly
>>     detection.
>>
>>
>>     Would you like to contribute this feature together?
>>
>>
>>     Thanks,
>>     William
>>
>>     On Sat, May 12, 2018 at 12:22 AM, Enrico D'Urso (JIRA) <
>> j...@apache.org>
>>     wrote:
>>
>>     >
>>     >     [ https://issues.apache.org/jira/browse/GRIFFIN-160?page=
>>     > com.atlassian.jira.plugin.system.issuetabpanels:comment-
>>     > tabpanel&focusedCommentId=16472199#comment-16472199 ]
>>     >
>>     > Enrico D'Urso commented on GRIFFIN-160:
>>     > ---------------------------------------
>>     >
>>     > Hi,
>>     >
>>     > there are several ways to go for anomaly detection implementation.
>>     >
>>     > The point is to have numerical data. If you want to apply AD against
>>     > non-numerical data you have to map string to number somehow.
>>     >
>>     > However, as Griffin uses Spark as the engine, I think K-Means can
>> be an
>>     > option.
>>     >
>>     > Basically, you have your data: you normalise it, decide the number
>> of
>>     > clusters, apply K-means, finally check the distance from final
>> centroids to
>>     > search for anomalies. MLlib fully supports it.
>>     >
>>     > Otherwise just get the mean and std and search for samples that are
>> 3sd+
>>     > far from the mean.
>>     >
>>     > More complicated stuff can be done using Covariance matrix and
>> Gaussian
>>     > distribution, more info here [https://www.coursera.org/
>>     > learn/machine-learning/lecture/C8IJp/helpUrl]
>>     >
>>     > but am not sure if doable in a distributed environment.
>>     >
>>     >
>>     >
>>     > Thanks,
>>     >
>>     > Enrico
>>     >
>>     >
>>     >
>>     > > Anomaly detection for thousands of tables
>>     > > -----------------------------------------
>>     > >
>>     > >                 Key: GRIFFIN-160
>>     > >                 URL: https://issues.apache.org/jira
>> /browse/GRIFFIN-160
>>     > >             Project: Griffin (Incubating)
>>     > >          Issue Type: New Feature
>>     > >            Reporter: William Guo
>>     > >            Assignee: William Guo
>>     > >            Priority: Major
>>     > >
>>     > > Hi team,
>>     > >
>>     > > I am trying find the Griffin road map, and here it is [
>>     > https://cwiki.apache.org/confluence/display/GRIFFIN/0.+Roadmap],
>> is this
>>     > the latest version?
>>     > >
>>     > > We have thousands of tables need to applied for data quality
>> validation,
>>     > is there any simple machine learning algorithm can be applied to
>> detect the
>>     > data quality issue instead of build a lot measures?  Will this be
>> added in
>>     > the Griffin road map if possible?
>>     > >
>>     > > Thanks, Randy
>>     > >
>>     >
>>     >
>>     >
>>     > --
>>     > This message was sent by Atlassian JIRA
>>     > (v7.6.3#76005)
>>     >
>>
>>
>>
>

Reply via email to