Re: [Apertium-stuff] How do I get a list of lemmas for nouns

2020-04-23 Thread Per Tunedal
Hi,
Thank you Kevin! Works like a charm.
BTW I've already changed 'unique' to 'sort -u'
Yours,
Per

On Thu, Apr 23, 2020, at 10:42, Kevin Brubeck Unhammer wrote:
> "Per Tunedal" 
> čálii:
> 
> > Hi Kevin,
> > thanks for the explanation. Thus they are homonyms. How do I get rid of the 
> > duplicates?
> > I just want:
> >
> > tur
> 
> before the `| uniq`, stick in
> 
>  | sed 's/[¹²³]//g'
> 
> 
> (You may have to change `uniq` to `sort -u` in case things are not ordered 
> already)
> 
> 
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> 
> 
> *Attachments:*
>  * signature.asc
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] How do I get a list of lemmas for nouns

2020-04-23 Thread Per Tunedal
Hi Tanmai,
unfortunately, your suggestion produced an error. Thus I've used Kevins 
solution instead.

sed: -e expression #1, char 11: Invalid content of \{\}

Yours,
Per Tunedal

On Thu, Apr 23, 2020, at 10:43, Tanmai Khanna wrote:
> Hi,
> How about you try this:
> 
> lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E 
> 's/[^<:>]+:([^<:>]+).*/\1/g' | sed -E 's/\p{No}//g' | uniq
> 
> Just a small addition to Daniel's earlier command, to delete all superscripts 
> before removing duplicates. Hopefully you don't need superscripts in your 
> lemmas elsewhere. If you do then we can do other things here.
> 
> *Note that I'm not able to reproduce this on my machine.* But I'm not able to 
> reproduce Daniel's command either so that might just be something to do with 
> my machine. I'm guessing it should work. Check it out and let me know.
> 
> Tanmai
> 
> On Thu, Apr 23, 2020 at 1:51 PM Per Tunedal  wrote:
>> __
>> Hi Kevin,
>> thanks for the explanation. Thus they are homonyms. How do I get rid of the 
>> duplicates?
>> I just want:
>> 
>> tur
>> 
>> Yours,
>> Per Tunedal
>> 
>> On Thu, Apr 23, 2020, at 10:00, Kevin Brubeck Unhammer wrote:
>>> "Per Tunedal" 
>>> čálii:
>>> 
>>> > Hi Daniel,
>>> > Thank you! Works like a charm with a small exception.
>>> >
>>> > I get some strange duplicates like e.g. tur:
>>> >
>>> > tur¹
>>> > tur²
>>> 
>>> slump vs färd, they have different paradigms:
>>> 
>>>  turtur¹>> n="mjölk__n_ut"/>
>>>  turtur²
>>> 
>>> 
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>> 
>>> 
>>> *Attachments:*
>>>  * signature.asc
>> 
>> ___
>>  Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> 
> 
> -- 
> *Khanna, Tanmai*
> 
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> 
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] How do I get a list of lemmas for nouns

2020-04-23 Thread Tanmai Khanna
Hi,
How about you try this:

lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E
's/[^<:>]+:([^<:>]+).*/\1/g' | sed -E 's/\p{No}//g' | uniq

Just a small addition to Daniel's earlier command, to delete all
superscripts before removing duplicates. Hopefully you don't need
superscripts in your lemmas elsewhere. If you do then we can do other
things here.

*Note that I'm not able to reproduce this on my machine.* But I'm not able
to reproduce Daniel's command either so that might just be something to do
with my machine. I'm guessing it should work. Check it out and let me know.

Tanmai

On Thu, Apr 23, 2020 at 1:51 PM Per Tunedal 
wrote:

> Hi Kevin,
> thanks for the explanation. Thus they are homonyms. How do I get rid of
> the duplicates?
> I just want:
>
> tur
>
> Yours,
> Per Tunedal
>
> On Thu, Apr 23, 2020, at 10:00, Kevin Brubeck Unhammer wrote:
>
> "Per Tunedal" 
> čálii:
>
> > Hi Daniel,
> > Thank you! Works like a charm with a small exception.
> >
> > I get some strange duplicates like e.g. tur:
> >
> > tur¹
> > tur²
>
> slump vs färd, they have different paradigms:
>
>   turtur¹ n="mjölk__n_ut"/>
>   turtur² n="film__n_ut"/>
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> *Attachments:*
>
>- signature.asc
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>


-- 
*Khanna, Tanmai*
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] How do I get a list of lemmas for nouns

2020-04-23 Thread Kevin Brubeck Unhammer
"Per Tunedal" 
čálii:

> Hi Kevin,
> thanks for the explanation. Thus they are homonyms. How do I get rid of the 
> duplicates?
> I just want:
>
> tur

before the `| uniq`, stick in

  | sed 's/[¹²³]//g'


(You may have to change `uniq` to `sort -u` in case things are not ordered 
already)


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] How do I get a list of lemmas for nouns

2020-04-23 Thread Per Tunedal
Hi Kevin,
thanks for the explanation. Thus they are homonyms. How do I get rid of the 
duplicates?
I just want:

tur

Yours,
Per Tunedal

On Thu, Apr 23, 2020, at 10:00, Kevin Brubeck Unhammer wrote:
> "Per Tunedal" 
> čálii:
> 
> > Hi Daniel,
> > Thank you! Works like a charm with a small exception.
> >
> > I get some strange duplicates like e.g. tur:
> >
> > tur¹
> > tur²
> 
> slump vs färd, they have different paradigms:
> 
>  turtur¹
>  turtur²
> 
> 
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> 
> 
> *Attachments:*
>  * signature.asc
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] How do I get a list of lemmas for nouns

2020-04-23 Thread Kevin Brubeck Unhammer
"Per Tunedal" 
čálii:

> Hi Daniel,
> Thank you! Works like a charm with a small exception.
>
> I get some strange duplicates like e.g. tur:
>
> tur¹
> tur²

slump vs färd, they have different paradigms:

  turtur¹
  turtur²


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] How do I get a list of lemmas for nouns

2020-04-23 Thread Per Tunedal
Hi Daniel,
Thank you! Works like a charm with a small exception.

I get some strange duplicates like e.g. tur:

tur¹
tur²

Yours,
Per Tunedal

On Wed, Apr 22, 2020, at 16:28, Daniel Swanson wrote:
> Hi Per,
> 
> If I understand correctly, this might give what you want:
> 
> lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E 
> 's/[^<:>]+:([^<:>]+).*/\1/g' | uniq
> 
> lt-expand lists all the forms, grep finds all the ones where the first tag is 
> , sed gets rid of everything but the lemma, and uniq removes duplicates.
> 
> Daniel
> 
> On Wed, Apr 22, 2020 at 7:54 AM Per Tunedal  wrote:
>> Hi,
>>  I need an ordinary dictionary of Swedish lemmas (just the lemmas, nothing 
>> else). How do I accomplish this?
>> 
>>  I read the Wiki:
>> http://wiki.apertium.org/wiki/Dixtools:_Grep
>> 
>>  Thus I tried:
>>  apertium-dixtools grep --par '.*__n' apertium-swe.swe.dix
>> 
>>  but nothing was filtered. I got the whole file.
>> 
>>  I have a bit trouble using grep, as I find regular expressions a bit hard 
>> to grasp. Unfortunately, I often get it wrong and get unexpected results.
>> 
>>  Now, I would like a list of nouns (just the lemmas) for a start. Then I 
>> need lists of the other parts of speech, verbs for instance.
>> 
>>  The expression below from http://wiki.apertium.org/wiki/Dictionary_reader:
>>  apertium-dixtools dic-reader list-lemmas apertium-swe.swe.dix
>>  gives me ALL lemmas. But I would like to choose the part of speech.
>> 
>>  I'm running Ubuntu as an app on Windows 10.
>> 
>>  Please give me a hand!
>> 
>>  Yours,
>>  Per Tunedal
>> 
>> 
>> 
>> 
>> 
>> 
>>  ___
>>  Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> 
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> 
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] How do I get a list of lemmas for nouns

2020-04-22 Thread Daniel Swanson
Hi Per,

If I understand correctly, this might give what you want:

lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+" | sed -E
's/[^<:>]+:([^<:>]+).*/\1/g' | uniq

lt-expand lists all the forms, grep finds all the ones where the first tag
is , sed gets rid of everything but the lemma, and uniq removes
duplicates.

Daniel

On Wed, Apr 22, 2020 at 7:54 AM Per Tunedal 
wrote:

> Hi,
> I need an ordinary dictionary of Swedish lemmas (just the lemmas, nothing
> else). How do I accomplish this?
>
> I read the Wiki:
> http://wiki.apertium.org/wiki/Dixtools:_Grep
>
> Thus I tried:
> apertium-dixtools grep --par '.*__n' apertium-swe.swe.dix
>
> but nothing was filtered. I got the whole file.
>
> I have a bit trouble using grep, as I find regular expressions a bit hard
> to grasp. Unfortunately, I often get it wrong and get unexpected results.
>
> Now, I would like a list of nouns (just the lemmas) for a start. Then I
> need lists of the other parts of speech, verbs for instance.
>
> The expression below from http://wiki.apertium.org/wiki/Dictionary_reader:
> apertium-dixtools dic-reader list-lemmas apertium-swe.swe.dix
> gives me ALL lemmas. But I would like to choose the part of speech.
>
> I'm running Ubuntu as an app on Windows 10.
>
> Please give me a hand!
>
> Yours,
> Per Tunedal
>
>
>
>
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] How do I get a list of lemmas for nouns

2020-04-22 Thread Per Tunedal
Hi,
I need an ordinary dictionary of Swedish lemmas (just the lemmas, nothing 
else). How do I accomplish this?

I read the Wiki:
http://wiki.apertium.org/wiki/Dixtools:_Grep

Thus I tried:
apertium-dixtools grep --par '.*__n' apertium-swe.swe.dix

but nothing was filtered. I got the whole file.

I have a bit trouble using grep, as I find regular expressions a bit hard to 
grasp. Unfortunately, I often get it wrong and get unexpected results.

Now, I would like a list of nouns (just the lemmas) for a start. Then I need 
lists of the other parts of speech, verbs for instance.

The expression below from http://wiki.apertium.org/wiki/Dictionary_reader:
apertium-dixtools dic-reader list-lemmas apertium-swe.swe.dix
gives me ALL lemmas. But I would like to choose the part of speech.

I'm running Ubuntu as an app on Windows 10.

Please give me a hand!

Yours,
Per Tunedal






___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff