Hello my dear ones,

A couple of days ago I was on IRC in #dev.openoffice.org chatting with JZA.

I came up with the idea of creating a GUI to edit the thesaurus of AOO.

JZA told me the files were in TXT format and gave me a URL with several information but I gave a quick look and didn't find anything about the data dictionary of the thesaurus.

The tool will be called "Proofing Tool GUI" and will be coded in PureBasic. Is this a good name? PureBasic allows to compile in Windows/Linux/Mac/Amiga.

The reason why I want to code it is because months ago I contacted my friends at Minho University in Portugal who are in charge of PT-pt and I wanted to send them words to be used as synonymous but they didn't know how to add them.

This made me think that there isn't a tool for doing that, so my idea is good because it can be used by the whole community of developers.

I unziped the Portuguese .OXT and grabbed the files:
- th_pt_PT.idx
- th_pt_PT.dat

I opened them with Microsoft _expression_ Web 4 to keep the UTF-8 format but didn't understand completely how they work.

For example, in the .idx one I had:
UTF-8
12940
1|6
a cerca de|16097
a começar de|19986
a favor|32934
a partir de|67469
a respeito de|77248
   ... etc...


in the .dat one I had:
UTF-8
1|3
-|anuviado
-|aperitivo
-|sigla
ababelado|1
-|atrapalhado|baralhado|atarantado|desnorteado
ababelar|1
-|baralhar|atrapalhar
abaçanado|1
   ... etc...

It seems there are at least three levels of synonymous in the .dat one but I don't know how to interpret them if I create a GUI.

Also, in the .idx one there are numbers too which I don't understand the meaning.

Is there a URL which explains every detail of those files?

Thanks!

Kind regards from,
         >Marco A.G.Pinto
           -----------------------



--

Reply via email to