Re: [Apertium-stuff] How do I get a list of lemmas for nouns
Hi, Thank you Kevin! Works like a charm. BTW I've already changed 'unique' to 'sort -u' Yours, Per On Thu, Apr 23, 2020, at 10:42, Kevin Brubeck Unhammer wrote: > "Per Tunedal" > čálii: > > > Hi Kevin, > > thanks for the explanation. Thus they are homonyms. How do I get rid of the > > duplicates? > > I just want: > > > > tur > > before the `| uniq`, stick in > > | sed 's/[¹²³]//g' > > > (You may have to change `uniq` to `sort -u` in case things are not ordered > already) > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > *Attachments:* > * signature.asc ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] How do I get a list of lemmas for nouns
Hi Tanmai, unfortunately, your suggestion produced an error. Thus I've used Kevins solution instead. sed: -e expression #1, char 11: Invalid content of \{\} Yours, Per Tunedal On Thu, Apr 23, 2020, at 10:43, Tanmai Khanna wrote: > Hi, > How about you try this: > > lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E > 's/[^<:>]+:([^<:>]+).*/\1/g' | sed -E 's/\p{No}//g' | uniq > > Just a small addition to Daniel's earlier command, to delete all superscripts > before removing duplicates. Hopefully you don't need superscripts in your > lemmas elsewhere. If you do then we can do other things here. > > *Note that I'm not able to reproduce this on my machine.* But I'm not able to > reproduce Daniel's command either so that might just be something to do with > my machine. I'm guessing it should work. Check it out and let me know. > > Tanmai > > On Thu, Apr 23, 2020 at 1:51 PM Per Tunedal wrote: >> __ >> Hi Kevin, >> thanks for the explanation. Thus they are homonyms. How do I get rid of the >> duplicates? >> I just want: >> >> tur >> >> Yours, >> Per Tunedal >> >> On Thu, Apr 23, 2020, at 10:00, Kevin Brubeck Unhammer wrote: >>> "Per Tunedal" >>> čálii: >>> >>> > Hi Daniel, >>> > Thank you! Works like a charm with a small exception. >>> > >>> > I get some strange duplicates like e.g. tur: >>> > >>> > tur¹ >>> > tur² >>> >>> slump vs färd, they have different paradigms: >>> >>> turtur¹>> n="mjölk__n_ut"/> >>> turtur² >>> >>> >>> ___ >>> Apertium-stuff mailing list >>> Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >>> >>> *Attachments:* >>> * signature.asc >> >> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > -- > *Khanna, Tanmai* > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] How do I get a list of lemmas for nouns
Hi, How about you try this: lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E 's/[^<:>]+:([^<:>]+).*/\1/g' | sed -E 's/\p{No}//g' | uniq Just a small addition to Daniel's earlier command, to delete all superscripts before removing duplicates. Hopefully you don't need superscripts in your lemmas elsewhere. If you do then we can do other things here. *Note that I'm not able to reproduce this on my machine.* But I'm not able to reproduce Daniel's command either so that might just be something to do with my machine. I'm guessing it should work. Check it out and let me know. Tanmai On Thu, Apr 23, 2020 at 1:51 PM Per Tunedal wrote: > Hi Kevin, > thanks for the explanation. Thus they are homonyms. How do I get rid of > the duplicates? > I just want: > > tur > > Yours, > Per Tunedal > > On Thu, Apr 23, 2020, at 10:00, Kevin Brubeck Unhammer wrote: > > "Per Tunedal" > čálii: > > > Hi Daniel, > > Thank you! Works like a charm with a small exception. > > > > I get some strange duplicates like e.g. tur: > > > > tur¹ > > tur² > > slump vs färd, they have different paradigms: > > turtur¹ n="mjölk__n_ut"/> > turtur² n="film__n_ut"/> > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > *Attachments:* > >- signature.asc > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > -- *Khanna, Tanmai* ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] How do I get a list of lemmas for nouns
"Per Tunedal" čálii: > Hi Kevin, > thanks for the explanation. Thus they are homonyms. How do I get rid of the > duplicates? > I just want: > > tur before the `| uniq`, stick in | sed 's/[¹²³]//g' (You may have to change `uniq` to `sort -u` in case things are not ordered already) signature.asc Description: PGP signature ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] How do I get a list of lemmas for nouns
Hi Kevin, thanks for the explanation. Thus they are homonyms. How do I get rid of the duplicates? I just want: tur Yours, Per Tunedal On Thu, Apr 23, 2020, at 10:00, Kevin Brubeck Unhammer wrote: > "Per Tunedal" > čálii: > > > Hi Daniel, > > Thank you! Works like a charm with a small exception. > > > > I get some strange duplicates like e.g. tur: > > > > tur¹ > > tur² > > slump vs färd, they have different paradigms: > > turtur¹ > turtur² > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > *Attachments:* > * signature.asc ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] How do I get a list of lemmas for nouns
"Per Tunedal" čálii: > Hi Daniel, > Thank you! Works like a charm with a small exception. > > I get some strange duplicates like e.g. tur: > > tur¹ > tur² slump vs färd, they have different paradigms: turtur¹ turtur² signature.asc Description: PGP signature ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] How do I get a list of lemmas for nouns
Hi Daniel, Thank you! Works like a charm with a small exception. I get some strange duplicates like e.g. tur: tur¹ tur² Yours, Per Tunedal On Wed, Apr 22, 2020, at 16:28, Daniel Swanson wrote: > Hi Per, > > If I understand correctly, this might give what you want: > > lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E > 's/[^<:>]+:([^<:>]+).*/\1/g' | uniq > > lt-expand lists all the forms, grep finds all the ones where the first tag is > , sed gets rid of everything but the lemma, and uniq removes duplicates. > > Daniel > > On Wed, Apr 22, 2020 at 7:54 AM Per Tunedal wrote: >> Hi, >> I need an ordinary dictionary of Swedish lemmas (just the lemmas, nothing >> else). How do I accomplish this? >> >> I read the Wiki: >> http://wiki.apertium.org/wiki/Dixtools:_Grep >> >> Thus I tried: >> apertium-dixtools grep --par '.*__n' apertium-swe.swe.dix >> >> but nothing was filtered. I got the whole file. >> >> I have a bit trouble using grep, as I find regular expressions a bit hard >> to grasp. Unfortunately, I often get it wrong and get unexpected results. >> >> Now, I would like a list of nouns (just the lemmas) for a start. Then I >> need lists of the other parts of speech, verbs for instance. >> >> The expression below from http://wiki.apertium.org/wiki/Dictionary_reader: >> apertium-dixtools dic-reader list-lemmas apertium-swe.swe.dix >> gives me ALL lemmas. But I would like to choose the part of speech. >> >> I'm running Ubuntu as an app on Windows 10. >> >> Please give me a hand! >> >> Yours, >> Per Tunedal >> >> >> >> >> >> >> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] How do I get a list of lemmas for nouns
Hi Per, If I understand correctly, this might give what you want: lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E 's/[^<:>]+:([^<:>]+).*/\1/g' | uniq lt-expand lists all the forms, grep finds all the ones where the first tag is , sed gets rid of everything but the lemma, and uniq removes duplicates. Daniel On Wed, Apr 22, 2020 at 7:54 AM Per Tunedal wrote: > Hi, > I need an ordinary dictionary of Swedish lemmas (just the lemmas, nothing > else). How do I accomplish this? > > I read the Wiki: > http://wiki.apertium.org/wiki/Dixtools:_Grep > > Thus I tried: > apertium-dixtools grep --par '.*__n' apertium-swe.swe.dix > > but nothing was filtered. I got the whole file. > > I have a bit trouble using grep, as I find regular expressions a bit hard > to grasp. Unfortunately, I often get it wrong and get unexpected results. > > Now, I would like a list of nouns (just the lemmas) for a start. Then I > need lists of the other parts of speech, verbs for instance. > > The expression below from http://wiki.apertium.org/wiki/Dictionary_reader: > apertium-dixtools dic-reader list-lemmas apertium-swe.swe.dix > gives me ALL lemmas. But I would like to choose the part of speech. > > I'm running Ubuntu as an app on Windows 10. > > Please give me a hand! > > Yours, > Per Tunedal > > > > > > > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
[Apertium-stuff] How do I get a list of lemmas for nouns
Hi, I need an ordinary dictionary of Swedish lemmas (just the lemmas, nothing else). How do I accomplish this? I read the Wiki: http://wiki.apertium.org/wiki/Dixtools:_Grep Thus I tried: apertium-dixtools grep --par '.*__n' apertium-swe.swe.dix but nothing was filtered. I got the whole file. I have a bit trouble using grep, as I find regular expressions a bit hard to grasp. Unfortunately, I often get it wrong and get unexpected results. Now, I would like a list of nouns (just the lemmas) for a start. Then I need lists of the other parts of speech, verbs for instance. The expression below from http://wiki.apertium.org/wiki/Dictionary_reader: apertium-dixtools dic-reader list-lemmas apertium-swe.swe.dix gives me ALL lemmas. But I would like to choose the part of speech. I'm running Ubuntu as an app on Windows 10. Please give me a hand! Yours, Per Tunedal ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff