[ML] Tickets for beginners

2020-02-13 Thread Лев Киселев
Hello everyone
I'd like to take following task:
https://issues.apache.org/jira/browse/IGNITE-12384?jql=project%20%3D%20IGNITE%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20component%20%3D%20ML%20AND%20labels%20%3D%20newbie

But I do not fully understand what exactly needs to be done.
As far as I understand encoders are needed to convert categorical variables
to numerical. So, to demonstrate how they works I need some dataset with
several categorical features. Should I search for it or generate by myself?
Then, do I need to solve some regression/classification problem in order
to demonstrate "power" of using categorical variables to fit prediction or
classification models?


Re: [ML] Tickets for beginners

2020-02-13 Thread Alexey Zinoviev
Hi, Lev!
I'll return with explanation on next week, maybe I need to adds some
details to this ticket.

P.S. About dataset - you could search among the datasets presented in
MLSandbox class or could add your own (I'd prefer small datasets from
http://archive.ics.uci.edu/ml/index.php)


пт, 14 февр. 2020 г. в 10:02, Лев Киселев :

> Hello everyone
> I'd like to take following task:
>
> https://issues.apache.org/jira/browse/IGNITE-12384?jql=project%20%3D%20IGNITE%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20component%20%3D%20ML%20AND%20labels%20%3D%20newbie
>
> But I do not fully understand what exactly needs to be done.
> As far as I understand encoders are needed to convert categorical variables
> to numerical. So, to demonstrate how they works I need some dataset with
> several categorical features. Should I search for it or generate by myself?
> Then, do I need to solve some regression/classification problem in order
> to demonstrate "power" of using categorical variables to fit prediction or
> classification models?
>


Re: [ML] Tickets for beginners

2020-02-16 Thread Alexey Zinoviev
I thought about this task. We have enough examples where encoders are used
as part of preprocessing and have regression or classification algorithm as
a result.
I suggest in this ticket to prepare example that are not ended with the
trainer, please have a look to examples/ml/dataset folder
Maybe you could create something meaningful like this.

Your choice!

пт, 14 февр. 2020 г. в 10:17, Alexey Zinoviev :

> Hi, Lev!
> I'll return with explanation on next week, maybe I need to adds some
> details to this ticket.
>
> P.S. About dataset - you could search among the datasets presented in
> MLSandbox class or could add your own (I'd prefer small datasets from
> http://archive.ics.uci.edu/ml/index.php)
>
>
> пт, 14 февр. 2020 г. в 10:02, Лев Киселев :
>
>> Hello everyone
>> I'd like to take following task:
>>
>> https://issues.apache.org/jira/browse/IGNITE-12384?jql=project%20%3D%20IGNITE%20AND%20status%20in%20(Open%2C%20Reopened)%20AND%20component%20%3D%20ML%20AND%20labels%20%3D%20newbie
>>
>> But I do not fully understand what exactly needs to be done.
>> As far as I understand encoders are needed to convert categorical
>> variables
>> to numerical. So, to demonstrate how they works I need some dataset with
>> several categorical features. Should I search for it or generate by
>> myself?
>> Then, do I need to solve some regression/classification problem in order
>> to demonstrate "power" of using categorical variables to fit prediction or
>> classification models?
>>
>