>You need to understand what the different statements are doing; just >as you need to understand what processing you apply on your data >(whether it's preprocessing or learning) to properly use any machine >learning tool.
I know, but the problem is that the csv file of the iris doesn't have such information and as I said, I think there are some additional steps that I don't know exactly what they are. For example, if you look at ~/.local/lib/python3.6/site-packages/sklearn/datasets/data/iris.csv you will see 150,4,setosa,versicolor,virginica 5.1,3.5,1.4,0.2,0 4.9,3.0,1.4,0.2,0 ... So, the first line means 150 instances (rows) with 4 columns and three iris types. However, when I use iris = load_iris() print(iris) I see a lot of metadata, such as: {'data': array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], ... [5.9, 3. , 5.1, 1.8]]), 'target': array([0, 0,...]), 'frame': None, 'target_names': array(['setosa', 'versicolor', 'virginica'], dtype='<U10'), 'DESCR': '.. _iris_dataset:\n\nIris plants dataset\n--------------------\n\n**Data Set Characteristics:**\n\n :Number of Instances: 150 (50 in each of three classes)\n :Number of Attributes: 4 numeric, predictive attributes and the class\n :Attribute Information:\n - sepal length in cm\n - sepal width in cm\n - petal length in cm\n - petal width in cm\n - class:\n - Iris-Setosa\n - Iris-Versicolour\n - Iris-Virginica\n \n The question is how these metadata are created and stored in this package? I mean, what does from sklearn.datasets import load_iris do with the csv file? If I know, then I am also able to create a similar dataset. Regards, Mahmood
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn