Hi, Andreas! yesterday i uploaded project of package to goldendict's forum and a few users made a few tests. They were very pleased and sent greeting for You :)
They also added Abbreviations file and i included it, too (it contains some popup-hints for current wordnet reincarnation) And also I've renamed new package from wordnet-goldendict to goldendict-wordnet, because all debian packages use such name system :) Full result patch in attache. PS: in the point: If You want to test building package in short time, You can set variable $short = true in debian/wn-for-goldendict.rb, then it will create dictionary with small part of wordnet and will work in short time period PPS: When i was writing this mail, i received Your mail about upload. Thanks for permissions, i'll upload package today, but i haven't permissions in svn.debian.org. I'll send a request for it, but it may take much time, so i'll attache finally version of the patch here. Please commit it upto SVN. -- ... mpd playing: Manowar - 07 March For Revenge (By The Soldiers Of Death) . ''`. Dmitry E. Oboukhov : :’ : email: un...@debian.org jabber://un...@uvw.ru `. `~’ GPGKey: 1024D / F8E26537 2006-11-21 `- 1B23 D4F8 8EC0 D902 0555 E438 AB8C 00CF F8E2 6537
diff -u wordnet-3.0/debian/changelog wordnet-3.0/debian/changelog --- wordnet-3.0/debian/changelog +++ wordnet-3.0/debian/changelog @@ -1,3 +1,13 @@ +wordnet (1:3.0-19) unstable; urgency=low + + * Added goldendict-wordnet package: it has been generated from + wordnet database by script which was written specially for + goldendict and other GUI dictionaries, closes: #555707. + * Added myself to uploaders list, thanks for permissions + to Andreas Tille. + + -- Dmitry E. Oboukhov <un...@debian.org> Thu, 12 Nov 2009 21:55:25 +0300 + wordnet (1:3.0-18) unstable; urgency=low * debian/patches/20_adj.all_fix.patch diff -u wordnet-3.0/debian/control wordnet-3.0/debian/control --- wordnet-3.0/debian/control +++ wordnet-3.0/debian/control @@ -2,11 +2,12 @@ Section: text Build-Depends: cdbs (>= 0.4.23-1.1), autotools-dev, debhelper (>= 7), quilt, tk8.5-dev, tcl8.5-dev, libxaw7-dev, flex, dictzip, python, groff, gs-common, - autoconf, automake, libtool, bison, man-db, libxss-dev, libxft-dev + autoconf, automake, libtool, bison, man-db, libxss-dev, libxft-dev, ruby Priority: optional Maintainer: Debian Science Team <debian-science-maintain...@lists.alioth.debian.org> DM-Upload-Allowed: yes -Uploaders: Andreas Tille <ti...@debian.org> +Uploaders: Andreas Tille <ti...@debian.org>, + Dmitry E. Oboukhov <un...@debian.org> Standards-Version: 3.8.3 Vcs-Browser: http://svn.debian.org/wsvn/debian-science/packages/wordnet/trunk/?rev=0&sc=0 Vcs-Svn: svn://svn.debian.org/svn/debian-science/packages/wordnet/trunk/ @@ -151,0 +153,19 @@ + +Package: goldendict-wordnet +Conflicts: wordnet-goldendict +Architecture: all +Depends: ${misc:Depends} +Recommends: goldendict +Description: electronic lexical database of English language for dict + WordNet(C) is an on-line lexical reference system whose design is + inspired by current psycholinguistic theories of human lexical + memory. English nouns, verbs, adjectives and adverbs are organized + into synonym sets, each representing one underlying lexical + concept. Different relations link the synonym sets. + . + WordNet was developed by the Cognitive Science Laboratory + (http://www.cogsci.princeton.edu/) at Princeton University under the + direction of Professor George A. Miller (Principal Investigator). + . + This package contains an adaptation wordnet database for such dictionaries + as goldendict. diff -u wordnet-3.0/debian/rules wordnet-3.0/debian/rules --- wordnet-3.0/debian/rules +++ wordnet-3.0/debian/rules @@ -31 +31,14 @@ + rm -f goldendict-wordnet.dsl goldendict-wordnet.dsl.dz + rm -f goldendict-wordnet_abrv.dsl +build/goldendict-wordnet:: goldendict-wordnet.dsl.dz goldendict-wordnet_abrv.dsl + +goldendict-wordnet_abrv.dsl: debian/goldendict-wordnet_abrv.dsl + echo -ne '\xff\xfe' > $@ + iconv -t utf-16le $< >> $@ + +goldendict-wordnet.dsl.dz: goldendict-wordnet.dsl + dictzip -k $< + +goldendict-wordnet.dsl: + ruby debian/wn-for-goldendict.rb > $@ only in patch2: unchanged: --- wordnet-3.0.orig/debian/goldendict-wordnet_abrv.dsl +++ wordnet-3.0/debian/goldendict-wordnet_abrv.dsl @@ -0,0 +1,64 @@ +#NAME "Abbreviations for WordNet 3.0 (En-En)" +#INDEX_LANGUAGE "English" +#CONTENTS_LANGUAGE "English" + +Freq. + Frequency count. The number of times each semantically tagged sense occurs in the Semantic Concordance files. +Syn + Synonyms - words with the same meaning +Ant + Antonyms - words with the opposite meaning +Pertains to noun + Only for relational adjectives. For example, "medical" pertains to "medicine" and "musical" pertains to "music". +Derived from adjective + Only for adverbs. +Similar to + Similar to ... +See Also + See Also ... +Derivationally related forms + For example, a derivationally related form of "meter" is "metrical". +Usage Domain + Usage Domains for this entry +Topics + Topic Domains for this entry +Regions + Region Domains for this entry +Members of this Usage Domain + Members of this Usage Domain +Members of this Topic + Members of this Topic +Members of this Region + Members of this Region +Hypernyms + The generic term used to designate a whole class of specific instances. Y is a hypernym of X if X is a (kind of) Y. E.g., "tree" is a hypernym of "oak". +Instance Hypernyms + E.g., the instance hypernym of "Mississippi River" is "river". +Hyponyms + The specific term used to designate a member of a class. X is a hyponym of Y if X is a (kind of) Y. E.g., "oak" is a hyponym of "tree". +Instance Hyponyms + Instance hyponyms represent specific instances of something. E.g., "Amazon River" is an instance hyponym of "river". +Member Holonyms + X is a member holonym of Y if Y is a member of X. E.g., "forest" is a member holonym of "tree". +Substance Holonyms + X is a substance holonym of Y if Y is a substance of X. E.g., "air" is a substance holonym of "oxygen". +Part Holonyms + X is a part holonym of Y if Y is a part of X. E.g., "bird" is a part holonym of "wing". +Member Meronyms + X is a member meronym of Y if X is a member of Y. E.g., "tree" is a member meronym of "forest". +Substance Meronyms + X is a substance of Y if X is a substance of Y. E.g., "oxygen" is a substance meronym of "air". +Part Meronyms + X is a part meronym of Y if X is a part of Y. E.g., "wing" is a part meronym of "bird". +Attrubites + Attribute is a noun for which adjectives express values. The noun "weight" is an attribute, for which the adjectives "light" and "heavy" express values. +Verb Group + Verb Group +Entailment + A verb X entails Y if X cannot be done unless Y is, or has been, done. E.g., "snore" entails "sleep". +Cause + A verb X causes Y if X denotes the causation of the state or activity referred to by Y. E.g., "scare" causes "fear". +Participle of verb + Participle of verb +Verb Frames + Generic sentence frames illustrating the types of simple sentences in which the verb can be used. only in patch2: unchanged: --- wordnet-3.0.orig/debian/wn-for-goldendict.rb +++ wordnet-3.0/debian/wn-for-goldendict.rb @@ -0,0 +1,704 @@ +#!/usr/bin/env ruby + +# A script to convert WordNet 3.0 dictionary from original +# format (http://wordnet.princeton.edu/wordnet/download/) +# to DSL format, suitable for Lingvo and GoldenDict. +# +# This script is released into public domain with no +# conditions. Use it as you see appropriate. + +# generates small part of dictionary, for testing purposes + +# This script was adapted to build debian package from exists debian src- +# package (some paths were changed) + +$short = false + +$CARDS = {} +$CARDS_COUNT = 0 + + +# INPUT FILES +$data_file_noun = 'dict/dbfiles/data.noun' +$data_file_verb = 'dict/dbfiles/data.verb' +$data_file_adj = 'dict/dbfiles/data.adj' +$data_file_adv = 'dict/dbfiles/data.adv' +$data_file_sentidx = 'dict/sentidx.vrb' +$data_file_sent = 'dict/sents.vrb' +$data_file_cntlist = 'dict/dbfiles/cntlist' +$index_file_noun = 'dict/dbfiles/index.noun' +$index_file_verb = 'dict/dbfiles/index.verb' +$index_file_adj = 'dict/dbfiles/index.adj' +$index_file_adv = 'dict/dbfiles/index.adv' + +# print UTF-8 BOM first +print "\xEF\xBB\xBF" + +# Dictionary Header +DIC_NAME = "WordNet 3.0. \(En-En\)" +ABBR_DIC_NAME = "Abbreviations for #{DIC_NAME}" +puts "\#NAME \"#{DIC_NAME}\"" +puts %q{#INDEX_LANGUAGE "English" +#CONTENTS_LANGUAGE "English"} + +$noun_data = File.open($data_file_noun, 'rb') +$verb_data = File.open($data_file_verb, 'rb') +$adj_data = File.open($data_file_adj, 'rb') +$adv_data = File.open($data_file_adv, 'rb') + +$LEMMA_IDX = {} + +$VERB_IDX = {} +File.open($data_file_sentidx, 'rb') { |sentidx| + sentidx.each_line { |line| + d = line.split() + if (d.size != 2) + $stderr.puts "WARNING: sentidx.vrb format error: #{d.inspect}" + end + $VERB_IDX[d[0]] = d[1] + } +} + +$VERB_PTRNS = {} +File.open($data_file_sent, 'rb') { |f| + f.each_line { |line| + d = line.strip.split(/\s+/, 2) + if (d.size != 2) + $stderr.puts "WARNING: sents.vrb format error: #{d.inspect}" + end + $VERB_PTRNS[d[0]] = d[1] + } +} + +$SENSE_COUNTS = {} +File.open($data_file_cntlist, 'rb') { |f| + f.each_line { |line| + d = line.strip.split(/\s+/) + if (d.size != 3) + $stderr.puts "WARNING: sents.vrb format error: #{d.inspect}" + end + sense = d[1].gsub(/\((p|a|ip)\)/, '') + $SENSE_COUNTS[sense] = d[0].to_i + } +} + +$POS = {'n'=> 'noun', 'v' => 'verb', 'a' => 'adjective', 's' => 'adjective', 'r' => 'adverb'} +$POS_NUM = {'n'=> '1', 'v' => '2', 'a' => '3', 's' => '5', 'r' => '4'} +$ROME = ['I', 'II', 'III', 'IV'] + +$frames = [ nil, + "Something ----s", + "Somebody ----s", + "It is ----ing", + "Something is ----ing PP", + "Something ----s something Adjective/Noun", + "Something ----s Adjective/Noun", + "Somebody ----s Adjective", + "Somebody ----s something", + "Somebody ----s somebody", + "Something ----s somebody", + "Something ----s something", + "Something ----s to somebody", + "Somebody ----s on something", + "Somebody ----s somebody something", + "Somebody ----s something to somebody", + "Somebody ----s something from somebody", + "Somebody ----s somebody with something", + "Somebody ----s somebody of something", + "Somebody ----s something on somebody", + "Somebody ----s somebody PP", + "Somebody ----s something PP", + "Somebody ----s PP", + 'Somebody\'s (body part) ----s', + "Somebody ----s somebody to INFINITIVE", + "Somebody ----s somebody INFINITIVE", + "Somebody ----s that CLAUSE", + "Somebody ----s to somebody", + "Somebody ----s to INFINITIVE", + "Somebody ----s whether INFINITIVE", + "Somebody ----s somebody into V-ing something", + "Somebody ----s something with something", + "Somebody ----s INFINITIVE", + "Somebody ----s VERB-ing", + "It ----s that CLAUSE", + "Something ----s INFINITIVE" +] + +def progress(count) + if count == 'done' + $stderr.puts("\n") + elsif count =~ /\D/ + $stderr.puts(" " + count) + elsif (count % 10000 == 0) + $stderr.print "." + end +end + +def get_data(offset, pos) + data_file = nil + case pos + when :n, 'n' + data_file = $noun_data + when :v, 'v' + data_file = $verb_data + when :a, 'a' + data_file = $adj_data + when :r, 'r' + data_file = $adv_data + else + $stderr.puts "WARN #7: get_data for unknown pos: #{pos}" + exit + end + data_file.seek(offset.to_i) + DataEntry.new(data_file.gets) +end + +class Card + attr_reader :headword, :senses + def initialize(headword) + @headword = headword + @all_senses = [] + adjectives = [] + @senses = {'n'=>[], 'v' =>[], 'a' => adjectives, 's' => adjectives, 'r' => []} + end + def << (sense) + unless @all_senses.include?(sense) + @all_senses << sense + @senses[sense.pos] << sense + end + end + def <=> (card) + @headword.downcase <=> card.headword.downcase + end + def print_out + puts @headword + poses = 0 + ['n', 'v', 'a', 'r'].each { |pos| + poses += 1 unless @senses[pos].empty? + } + pos_count = 0 + ['n', 'v', 'a', 'r'].each { |pos| + pos_senses = @senses[pos] + if (pos_senses.size > 0) + if (poses > 1) + puts "\t[m0][b]#{$ROME[pos_count]}[/b][/m]" + pos_count += 1 + end + puts "\t[m1][p]#{$POS[pos]}[/p][/m]" + sense_count = 1 + pos_senses_total = pos_senses.size + pos_senses.sort {|x, y| + next 0 if $short + + val1 = x.sense_key(@headword) + val2 = y.sense_key(@headword) + count1 = $SENSE_COUNTS[val1] || 0 + count2 = $SENSE_COUNTS[val2] || 0 + + if (count1 + count2 > 0) + comp = count2 <=> count1 # reverse comparison here! + if comp != 0 + next comp + end + end + + idxEntry = x.idx + if (idxEntry.nil?) + $stderr.puts "No idxEntry for headword: #...@headword}" + exit + end + val1 = idxEntry.offsets.index(x.offset) + val2 = idxEntry.offsets.index(y.offset) + if (val1.nil? || val2.nil?) + idxEntry = y.idx + if (idxEntry.nil?) + $stderr.puts "No idxEntry for headword: #...@headword}" + exit + end + val1 = idxEntry.offsets.index(x.offset) + val2 = idxEntry.offsets.index(y.offset) + end + + if (val1.nil? || val2.nil?) # can't compare for some reasons... + 0 + else + idxEntry.offsets.index(x.offset) <=> idxEntry.offsets.index(y.offset) + end + }.each { |sense| + if (pos_senses_total > 1) + print "\t[m2][b]#{sense_count}.[/b] " + sense_count += 1 + else + print "\t[m2] " + end + sense.print_out(@headword) + } + end + } + end +end + +class IdxEntry + attr_accessor :offsets, :lemma, :senses + def initialize(str) + @senses = [] + @str = str + data = str.split + @lemma = data[0] + @pos = data[1] + @synset_cnt = data[2].to_i + @p_cnt = data[3] + @pointers = "" + i = 3 + Integer(@p_cnt).times { + i += 1 + @pointers << data[i] + } + i += 1 + @sense_cnt = data[i] + i += 1 + @tagsense_cnt = data[i] + i += 1 + @offsets = [] + (i..data.size-1).each { |idx| + @offsets << data[idx].to_i + } + if (@offsets.size != @synset_cnt) + $stderr.puts "ERROR #1: size mismatch" + exit + end + end + def to_s + "#...@lemma}" # : POS: #...@pos}" #, Senses: #...@synset_cnt}" + end + def add_sense(sense) + sense.idx = self + @senses << sense + sense.each_headword { |hw| + ($CARDS[hw] ||= Card.new(hw)) << sense + } + end +end + +class DataEntry + attr_accessor :words, :str, :pos, :idx, :offset, :lex_ids + def initialize(str) + @str = str + data = str.split + @offset = data[0].to_i + @lex_filenum = data[1] + @pos = data[2] + @w_cnt = [data[3]].pack('H2')[0] + @words = [] + i = 4 + @lex_ids = [] + @w_cnt.times { + @words << data[i].gsub(/_/, ' ').gsub(/\s*\((p|a|ip)\)\s*$/, '') + i += 1 + @lex_ids << [data[i]].pack('h')[0] + i += 1 + } + + @p_cnt = data[i].to_i + i += 1 + @pointers = [] + @p_cnt.times { + pointer = [] + pointer << data[i] + pointer << data[i + 1] + pointer << data[i + 2] + pointer << data[i + 3] + i += 4 + @pointers << pointer + } + + @frames = [] + # everything from this point up to the "|" is verb frames data + if data[i] != "|" # we found a verb frame + f_cnt = data[i].to_i + i += 1 + if (f_cnt == 0) + $stderr.puts "ERROR: 0 number of verb frames specified" + exit + end + + f_cnt.times { + if (data[i] != "+") + $stderr.puts "ERROR: wrong verb frame format!" + exit + end + i += 1 + @frames << [data[i], data[i + 1]] + i += 2 + } + end + + if data[i] != "|" + $stderr.puts "ERROR: expected '|' separator, but got: #{data[i]}" + exit + end + i += 1 + + @gloss = data[i, data.size - i].join(" ").gsub(/\[/, '\[').gsub(/\]/, '\]') + @gloss_str = "" + end + def == (other) + @str == other.str + end + def each_headword + @words.each { |w| + yield w + } + end + def to_s + "Set: #...@words.inspect}, P_CNT: #...@p_cnt}, Pointers: #...@pointers.inspect}, Gloss: #...@gloss}" + end + def get_pointer_data(headword, other, src_target) + if (src_target == "0000") + return other.words + else + src = [src_target[0, 2]].pack('H2')[0] + target = [src_target[2, 2]].pack('H2')[0] + h_src = words[src - 1] + if (h_src == headword) + return [other.words[target - 1]] + else + return ["#{make_link(other.words[target - 1])} [c darkgray](for: #{make_link(words[src - 1])})[/c]"] + end + end + end + def get_frame_data(headword, frame) + f_num = frame[0].to_i + w_num = [frame[1]].pack('H2')[0] + if (w_num == 0) + return [$frames[f_num]] + else + if (w_num < 1) + $stderr.puts "ERROR: w_num is invalid!" + exit + end + h_src = words[w_num - 1] + if (h_src == headword) + return [$frames[f_num]] + else + return ["[*][ex]#{$frames[f_num]}[/ex][/*] [c darkgray](for: #{make_link(h_src)})[/c]"] + end + end + end + def sense_key(headword) + i = @words.index(headword) + if (i.nil?) + $stderr.puts "ERROR: can't find index for the headword: #{headword}" + exit + end + res = "#{headword.downcase.gsub(/\s+/, '_')}%#{$pos_n...@pos]}:#...@lex_filenum}:#{sprintf('%02d', @lex_ids[i])}" + if (@pos != 's') + res << "::" + else + @pointers.each {|ptr| + if (ptr[0] == "&") # similar to + similars = get_data(ptr[1], ptr[2]) + res << ":#{similars.words[0]}:#{sprintf('%02d',similars.lex_ids[0])}" + end + } + end + res + end + def freq_count(headword) + $SENSE_COUNTS[sense_key(headword)] || 0 + end + def print_out(headword) + $headword = headword + + str1 = "" + exa = false + extra = "" + freq = if (freq_count(headword) > 0) + " [com][c darkgray]([p]Freq.[/p] #{freq_count(headword)})[/c][/com]" + else + "" + end + + @gloss.split(';').each { |s| + s = "#{extra}; #{s}" unless extra.empty? + extra = "" + + # detect broken quotations + if s.gsub(/[^"]/, '').size % 2 != 0 + extra = s + next + end + + if s =~ /^\s*(".*)$/ # example + unless freq.empty? + str1 << freq + freq = "" + end + example = $1.gsub(/^"(.*)"$/, '\1') + str1 << "[/m]\n\t[m3]- [*][ex]#{example}[/ex][/*]" + exa = true + else + if (exa) + str1 << "[/m]\n\t[m3]" + end + s = "[trn]#{s.strip.gsub(/(\(.*?\))/, '[i]\1[/i]')}[/trn]" + if (str1.empty?) + str1 << s + else + if (exa) + str1 << s + else + str1 << "; #{s}" + end + end + exa = false + end + } + + puts "#{str1}#{freq}[/m]" + + print_array(@words, 'Syn', "[c blue]•[/c]") + + antonyms = [] + pertainyms = [] + derivs = [] + deriv_rels = [] + topics = [] + regions = [] + usages = [] + m_topics = [] + m_regions = [] + m_usages = [] + hypers = [] + inst_hypers = [] + hypos = [] + inst_hypos = [] + m_holos = [] + s_holos = [] + p_holos = [] + m_meros = [] + s_meros = [] + p_meros = [] + attribs = [] + verb_group = [] + ents = [] + alsos = [] + causes = [] + similars = [] + part_verbs = [] + @pointers.each {|ptr| + if (ptr[0] == '!') # antonym + antonyms += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "\\") # pertainym or deriv. from adjective + if (@pos == 'r') # adverb + derivs += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (@pos == 'a' || @pos == 's') # adjective + pertainyms += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + else + $stderr.puts "ERROR: unexpected POS for slash: #...@pos}" + exit + end + elsif (ptr[0] == "=") # attributes + attribs += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == ";c") # topics domain + topics += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == ";r") # regions domain + regions += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == ";u") # usage domain + usages += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "-c") # topics domain + m_topics += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "-r") # regions domain + m_regions += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "-u") # usage domain + m_usages += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == '$') # verb group + verb_group += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == '*') # entailment + ents += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == '^') # see also + alsos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == '>') # see also + causes += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == '+') # deriv related form + deriv_rels += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "@") # hypernyms + hypers += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "@i") # instance hypernyms + inst_hypers += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "~") # hyponyms + hypos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "~i") # instance hyponyms + inst_hypos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "#m") # m holonyms + m_holos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "#s") # s holonyms + s_holos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "#p") # p holonyms + p_holos += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "%m") # m meronyms + m_meros += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "%s") # s meronyms + s_meros += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "%p") # p meronyms + p_meros += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "&") # similar to + similars += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + elsif (ptr[0] == "<") # similar to + part_verbs += get_pointer_data(headword, get_data(ptr[1], ptr[2]), ptr[3]) + else + $stderr.puts "WARN #8: Unknown pointer type #{ptr[0]}" + end + } + + print_array(antonyms, 'Ant', "[c red]•[/c]") + print_array(derivs, 'Derived from adjective', "[c deepskyblue]•[/c]") + print_array(pertainyms, 'Pertains to noun', "[c deepskyblue]•[/c]") + print_array(similars, 'Similar to', "[c darkturquoise]•[/c]") + print_array(alsos, 'See Also', "[c darkturquoise]•[/c]") + print_array(deriv_rels, 'Derivationally related forms', "[c dodgerblue]•[/c]") + + print_array(usages, 'Usage Domain', "[c darkorchid]•[/c]") + print_array(topics, 'Topics', "[c darkorchid]•[/c]") + print_array(regions, 'Regions', "[c darkorchid]•[/c]") + print_array(m_usages, 'Members of this Usage Domain') + print_array(m_topics, 'Members of this Topic') + print_array(m_regions, 'Members of this Region') + + print_array(hypers, 'Hypernyms') + print_array(inst_hypers, 'Instance Hypernyms') + + print_array(hypos, 'Hyponyms') + print_array(inst_hypos, 'Instance Hyponyms') + + print_array(m_holos, 'Member Holonyms') + print_array(s_holos, 'Substance Holonyms') + print_array(p_holos, 'Part Holonyms') + + print_array(m_meros, 'Member Meronyms') + print_array(s_meros, 'Substance Meronyms') + print_array(p_meros, 'Part Meronyms') + + print_array(attribs, 'Attrubites', "[c yellow]•[/c]") + + print_array(verb_group, 'Verb Group', "[c maroon]•[/c]") + print_array(ents, 'Entailment') + print_array(causes, 'Cause') + + print_array(part_verbs, "Participle of verb") + + verb_sentences = [] + unless (@frames.empty?) + puts "\t[m3][com][c maroon]•[/c] [p]Verb Frames[/p]:[/com][/m]" + @frames.each {|frame| + verb_sentences += get_frame_data(headword, frame) + } + end + + if @pos == 'v' # only for verbs + key = sense_key(headword) + values = $VERB_IDX[key] + if (values) + values.split(/,/).each { |value| + verb_sentences << $VERB_PTRNS[value].gsub(/%s/, headword) + } + end + end + + verb_sentences.each { |sentence| + if sentence =~ /\[ex\]/ + puts "\t[m4]- #{sentence}[/m]" + else + puts "\t[m4]- [*][ex]#{sentence}[/ex][/*][/m]" + end + } + end + def print_array(a, label, prefix = "[c darkgray]•[/c]") + a -= [$headword] + a.uniq! + separator = if (a.size > 6) + "[/m]\n\t[m4]" + else + "" + end + puts "\t[m3][com]#{prefix} [p]#{label}[/p]:#{separator} #{a.collect{|x| make_link(x)}.join(', ')}[/com][/m]" unless a.empty? + end + def make_link(target) + target = target.strip + if (target =~ /<<.+>>/) + target + else + # no need to validate links, the format is good, no broken links + "<<#{target}>>" + end + end +end + +count = 0 + +File.foreach($index_file_noun) { |idx_line| + next if idx_line =~ /^\s\s/ + entry = IdxEntry.new(idx_line) + entry.offsets.each { |offset| + d_entry = get_data(offset, :n) + entry.add_sense(d_entry) + } + count += 1 + break if count == 600 && $short + progress(count) +} +progress($index_file_noun + " was processed"); + +File.foreach($index_file_verb) { |idx_line| + next if idx_line =~ /^\s\s/ + entry = IdxEntry.new(idx_line) + entry.offsets.each { |offset| + d_entry = get_data(offset, :v) + entry.add_sense(d_entry) + } + count += 1 + break if count == 1200 && $short + progress(count) +} +progress($index_file_verb + " was processed"); + +File.foreach($index_file_adj) { |idx_line| + next if idx_line =~ /^\s\s/ + entry = IdxEntry.new(idx_line) + entry.offsets.each { |offset| + d_entry = get_data(offset, :a) + entry.add_sense(d_entry) + } + count += 1 + break if count == 1800 && $short + progress(count) +} +progress($index_file_adj + " was processed"); + +File.foreach($index_file_adv) { |idx_line| + next if idx_line =~ /^\s\s/ + entry = IdxEntry.new(idx_line) + entry.offsets.each { |offset| + d_entry = get_data(offset, :r) + entry.add_sense(d_entry) + } + count += 1 + break if count == 2400 && $short + progress(count) +} +progress($index_file_adj + " was processed"); + +card_count = 0 +$CARDS.values.sort.each { |card| + card.print_out + card_count += 1 + progress(card_count) +} +progress("CARDS were processed"); + +$noun_data.close +$verb_data.close +$adj_data.close +$adv_data.close + +$stderr.puts "TOTAL CARDS: #{$CARDS.size}" only in patch2: unchanged: --- wordnet-3.0.orig/debian/goldendict-wordnet.install +++ wordnet-3.0/debian/goldendict-wordnet.install @@ -0,0 +1,2 @@ +goldendict-wordnet.dsl.dz /usr/share/goldendict-wordnet/ +goldendict-wordnet_abrv.dsl /usr/share/goldendict-wordnet/
signature.asc
Description: Digital signature