Thanks very much for all of your tips. Take noun as an example. First, I need
find all the lemma_names in all the synsets whose pos is 'n'. Second, for each
lemma_name, I will check all their sense number.
1) Surely,we can know the number of synset whose pos is noun by
>>> len([synset for synset in wn.all_synsets('n')])
82115
However, confusingly it is unsuccessful to get a list of lemma names of these
synsets by
>>> lemma_list = [synset.lemma_names for synset in wn.all_synsets('n')]
>>> lemma_list[:20]
[['entity'], ['physical_entity'], ['abstraction', 'abstract_entity'],
['thing'], ['object', 'physical_object'], ['whole', 'unit'], ['congener'],
['living_thing', 'animate_thing'], ['organism', 'being'], ['benthos'],
['dwarf'], ['heterotroph'], ['parent'], ['life'], ['biont'], ['cell'],
['causal_agent', 'cause', 'causal_agency'], ['person', 'individual', 'someone',
'somebody', 'mortal', 'soul'], ['animal', 'animate_being', 'beast', 'brute',
'creature', 'fauna'], ['plant', 'flora', 'plant_life']]
>>> type(lemma_list)
<type 'list'>
Though the lemma_list is a list in the above codes, it contains so many
unnecessary [ and ]. How come it is like this? But what we desire and expect is
a list without this brackets. Confused, I am really curious to know why.
2) Then I have to use a loop and extend to get all the lemma_names from synset:
>>> synset_list = list(wn.all_synsets('n'))
>>> lemma_list = []
>>> for synset in synset_list:
lemma_list.extend(synset.lemma_names)
>>> lemma_list[:20]
['entity', 'physical_entity', 'abstraction', 'abstract_entity', 'thing',
'object', 'physical_object', 'whole', 'unit', 'congener', 'living_thing',
'animate_thing', 'organism', 'being', 'benthos', 'dwarf', 'heterotroph',
'parent', 'life', 'biont']
3) In this case, I have to use loop to get all the lemma_names instead of
[synset.lemma_names for synset in wn.all_synsets('n')]. The following is a
working solution:
>>> def average_polysemy(pos):
synset_list = list(wn.all_synsets(pos))
sense_number = 0
lemma_list = []
for synset in synset_list:
lemma_list.extend(synset.lemma_names)
for lemma in lemma_list:
sense_number_new = len(wn.synsets(lemma, pos))
sense_number = sense_number + sense_number_new
return sense_number/len(synset_list)
>>> average_polysemy('n')
3
Thanks again.
--
http://mail.python.org/mailman/listinfo/python-list