Re: [Apertium-stuff] Secondary Tag Prefixes

Samuel Sloniker Thu, 14 May 2020 08:01:09 -0700

+1

On Sun, May 10, 2020 at 10:14 AM Xavi Ivars <xavi.iv...@gmail.com> wrote:


> First of all, just to mention I don't consider myself a language developer
> (but someone who messes around everything).
>
> -  I think I would leave this for the "secondary tag" developer, similar
> to what we already do to the "primary tags" one. For example, no-one
> forbids currently having a primary tag with any symbol, as long as it's not
> a stream-related one (<,>,^,$,+).
> - Like Jonathan, I think we don't need to have things like
> <surfaceform:xxxx>. It's too long, and would probably clutter the stream
> too much. (Let's remember that, even if the stream is not meant to be
> "human read", it is somewhat "human readable", and it being as concise as
> possible helps.
> - That said, I would *strongly encourage* the secondary tag developer to
> have meaningful secondary tag prefixes, the same way we have meaningful
> primary tags. While we don't have <name> or <preposition>, we also don't
> have <€> and <£>, but <n> and <pr>. Having meaningful tags is an awesome
> feature of the stream, that makes it relatively simple to manually create
> input for any part of the pipeline (either to tests a specific command, to
> write tests,...)
>
> So I would *recommend *having short lowercase prefixes, that make it easy
> to understand (or, at least, remember once seen once) what the secondary
> tag is about.
>
>
> Missatge de Francis Tyers <fty...@prompsit.com> del dia dg., 10 de maig
> 2020 a les 16:07:
>
>> El 2020-05-10 14:51, Samuel Sloniker escribió:
>> > Would it be worth designing a parsing library?
>> >
>> > On Sun, May 10, 2020 at 3:15 AM Flammie A Pirinen <flam...@iki.fi>
>> > wrote:
>> >
>> >> On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote:
>> >>> For khannatanmai's GSoC project, secondary tags will be
>> >> implemented in a
>> >>> backwards compatible manner. That it in itself indisputable. But,
>> >> there is
>> >>> a question of how the initial batch of secondary tags should look.
>> >>>
>> >>> I feel they should be in the form of <sf:cdefg>, as in a very
>> >> short textual
>> >>> lower-case prefix, followed by :, followed by whatever value there
>> >> is. Or
>> >>> even an upper-case prefix, as in <S:cdefg> or <SF:cdefg>.
>> >>>
>> >>> spectie wants symbol prefixes in the form of <%:cdefg>.
>> >>
>> >> I feel like this is just a bikeshed[0] issue, but since I want this
>> >> project to succeed I'll give my 2 cents / rants:
>> >>
>> >> I don't personally find apertium stream format readable, if I need
>> >> to
>> >> make sense of it I will anyways have to preprocess a lot, enough
>> >> that
>> >> I'd say apertium stream format need visualisation scripts to be
>> >> readable. It's not very hard to have dev scripts for this. That
>> >> being
>> >> said, I don't find apertium stream format very machine readable
>> >> either;
>> >> with regexes you need tons of ëscapes and double escapes, with
>> >> programming languages... well, you have to use regexes because it's
>> >> not
>> >> a standard format with readily available parsing library or a format
>> >> neatly designed for python split() or c strtoks, or so... I'm fine
>> >> with
>> >> either special symbols or strings for whatever, as a purely personal
>> >> preference I've been pro feature=value even before ud times but
>> >> that's
>> >> not important, as long as stuff is handlable with grep and sed
>> >> without
>> >> convoluted expressions it's all good, no? To that ggoal on the
>> >> question
>> >> of having known set of prefixes, I have always been of the opinion
>> >> that
>> >> any mature release-quality apertium stuff would follow the tags docu
>> >> on
>> >> the wiki[1], I would expect similar to be true for prefixes as well.
>> >>
>> >> One side note: I think there is a level of abstraction we often
>> >> overlook
>> >> in these developments; a part of language data developer base will
>> >> probably interact with these secondary things through the XML
>> >> formats if
>> >> I understand correctly? Surely one of the things that can be done
>> >> regardless of what kind of stream format representation the seodnary
>> >> stuff has, is to have the xml format part more self-documenting and
>> >> stream format more readale? And like eventually one could think
>> >> there
>> >> were tooling and visualisations or whatnot to support whatever
>> >> readable
>> >> and parsable formats if enough stuff is in the xml sources.
>> >>
>> >> so tldr; just pick whatever greppable stuff for apertium strem
>> >> format.
>> >>
>> >> [0] <http://black.bikeshed.com/>
>> >> [1] <https://wiki.apertium.org/wiki/List_of_symbols>
>> >>
>> >> --
>> >> Regards, Flammie <https://flammie.github.io>
>> >> (Please note, that I will often include my replies inline instead of
>> >> top or bottom of the mail)
>> >> _______________________________________________
>> >> Apertium-stuff mailing list
>> >> Apertium-stuff@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> > _______________________________________________
>> > Apertium-stuff mailing list
>> > Apertium-stuff@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>> There is already
>> https://github.com/apertium/streamparser
>>
>> for Python...
>>
>> Fran
>>
>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
> --
> < Xavi Ivars >
> < http://xavi.ivars.me >
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Secondary Tag Prefixes

Reply via email to