I am trying to print a simple decision tree for my homework.
The answer must keep in this format:
Top 7,4,0.95
career gain = 100
1.Management 2, 3, 0.9709505944546686
2.Service 5, 1, 0.6500224216483541
location gain = 100
1.Oregon 4, 1, 0.7219280948873623
2.California 3, 3, 1.0
edu_level gain = 100
1.High School 5, 1, 0.6500224216483541
2.College 2, 3, 0.9709505944546686
years_exp gain = 100
1.Less than 3 3, 1, 0.8112781244591328
2.3 to 10 2, 1, 0.9182958340544896
3.More than 10 2, 2, 1.0
Here is my code:
features={'edu_level':['High School','College'],'career':
['Management','Service'],'years_exp':['Less than 3','3 to 10','More than
10'],'location':['Oregon','California']}
print('Top 7,4,0.95')
for key in features:
print('{} gain = {}'.format(key,100))
attributes_list=features[key]
kargs={}
for i in range(len(attributes_list)):
kargs[key]=attributes_list[i]
low=table.count('Low',**kargs)
high=table.count('High',**kargs)
print('\t{}.{} {}, {},
{}'.format(i+1,attributes_list[i],low,high,entropy(low,high)))
I set all the gain as 100 now.But actually the gain must calculate with the
data below.
For example, the career gain need the data of 'Management' and 'Service'.
I don't know how to do.
or Anyone can provide me a better logic?
--
https://mail.python.org/mailman/listinfo/python-list