Re: data science tutorials
introduction to data science
data science as it's name says, is the science of retrieving, preprocessing and usingg the data in different contexts of machine learning
for example, we can use the data to predict the prices of house in a series of time
in theory, machine learning is fall into 2 categories
supervised learning
unsupervised learning
although we have another type of machine learning called reinforcement learning, but I might not cover it.
supervised learning
let me start by an example
suppose you have a dataset of emails, and it categorises into spam or not spam (we call it ham)
now, we want to classify if for example if we receive another new email, is it spam, or ham
so, we feed the data, which is called x, and we receive y, which is it's class (we need to train the model and show it which is spam, and which isn't and the model learns based on our data)
since classification between spam or ham is between 0 and 1 (2 categories), it is called binary classification as well.
an example of classification algorithm is called naive bayze classification, which scikit-learn has support for it (we will cover that as well).
another example:
suppose that we want to predict the prices of houses in a city
so, this is not a classification task. we want to calculate a quantity which is the price
this is called regression
the simplest algorithm of regression is called linear regression, which is available on scikit-learn
simple example of linear regression
suppose this function:
def calc_bmi(weight, height):
return weight/(height**2)
this is a simple linear regression function
given the features (weight, height), it can calculate bmi
now, download this data and we will play with it a little bit
create a python file and put the following in it:
import numpy as np
import pandas as pd
import joblib
from sklearn.linear_model import LinearRegression
def calc_bmi(weight, height): return weight/(height**2) df = pd.read_csv("Davis.csv") print(df.head()) x = df[["weight", "height"]] y = [] for i, r in x.iterrows(): y.append(calc_bmi(r["weight"], r["height"])) r = LinearRegression() r.fit(x, y) joblib.dump(r, "bmi.pkl")
this is a simple linear regression based on a dataset, and I know this is not the way of doing it, but this is an example of showing you what regression is.
so, in classification, we predict descrete numbers (1, 2, etc which is our classes), while in regression,, we predict based on Continuous numbers (0, 0.0000000001, 0.0000000002, etc).
have a nice time
-- Audiogames-reflector mailing list Audiogames-reflector@sabahattin-gucukoglu.com https://sabahattin-gucukoglu.com/cgi-bin/mailman/listinfo/audiogames-reflector