+1 On Sun, May 10, 2020 at 10:14 AM Xavi Ivars <xavi.iv...@gmail.com> wrote:
> First of all, just to mention I don't consider myself a language developer > (but someone who messes around everything). > > - I think I would leave this for the "secondary tag" developer, similar > to what we already do to the "primary tags" one. For example, no-one > forbids currently having a primary tag with any symbol, as long as it's not > a stream-related one (<,>,^,$,+). > - Like Jonathan, I think we don't need to have things like > <surfaceform:xxxx>. It's too long, and would probably clutter the stream > too much. (Let's remember that, even if the stream is not meant to be > "human read", it is somewhat "human readable", and it being as concise as > possible helps. > - That said, I would *strongly encourage* the secondary tag developer to > have meaningful secondary tag prefixes, the same way we have meaningful > primary tags. While we don't have <name> or <preposition>, we also don't > have <€> and <£>, but <n> and <pr>. Having meaningful tags is an awesome > feature of the stream, that makes it relatively simple to manually create > input for any part of the pipeline (either to tests a specific command, to > write tests,...) > > So I would *recommend *having short lowercase prefixes, that make it easy > to understand (or, at least, remember once seen once) what the secondary > tag is about. > > > Missatge de Francis Tyers <fty...@prompsit.com> del dia dg., 10 de maig > 2020 a les 16:07: > >> El 2020-05-10 14:51, Samuel Sloniker escribió: >> > Would it be worth designing a parsing library? >> > >> > On Sun, May 10, 2020 at 3:15 AM Flammie A Pirinen <flam...@iki.fi> >> > wrote: >> > >> >> On Fri, May 08, 2020 at 04:50:45PM +0200, Tino Didriksen wrote: >> >>> For khannatanmai's GSoC project, secondary tags will be >> >> implemented in a >> >>> backwards compatible manner. That it in itself indisputable. But, >> >> there is >> >>> a question of how the initial batch of secondary tags should look. >> >>> >> >>> I feel they should be in the form of <sf:cdefg>, as in a very >> >> short textual >> >>> lower-case prefix, followed by :, followed by whatever value there >> >> is. Or >> >>> even an upper-case prefix, as in <S:cdefg> or <SF:cdefg>. >> >>> >> >>> spectie wants symbol prefixes in the form of <%:cdefg>. >> >> >> >> I feel like this is just a bikeshed[0] issue, but since I want this >> >> project to succeed I'll give my 2 cents / rants: >> >> >> >> I don't personally find apertium stream format readable, if I need >> >> to >> >> make sense of it I will anyways have to preprocess a lot, enough >> >> that >> >> I'd say apertium stream format need visualisation scripts to be >> >> readable. It's not very hard to have dev scripts for this. That >> >> being >> >> said, I don't find apertium stream format very machine readable >> >> either; >> >> with regexes you need tons of ëscapes and double escapes, with >> >> programming languages... well, you have to use regexes because it's >> >> not >> >> a standard format with readily available parsing library or a format >> >> neatly designed for python split() or c strtoks, or so... I'm fine >> >> with >> >> either special symbols or strings for whatever, as a purely personal >> >> preference I've been pro feature=value even before ud times but >> >> that's >> >> not important, as long as stuff is handlable with grep and sed >> >> without >> >> convoluted expressions it's all good, no? To that ggoal on the >> >> question >> >> of having known set of prefixes, I have always been of the opinion >> >> that >> >> any mature release-quality apertium stuff would follow the tags docu >> >> on >> >> the wiki[1], I would expect similar to be true for prefixes as well. >> >> >> >> One side note: I think there is a level of abstraction we often >> >> overlook >> >> in these developments; a part of language data developer base will >> >> probably interact with these secondary things through the XML >> >> formats if >> >> I understand correctly? Surely one of the things that can be done >> >> regardless of what kind of stream format representation the seodnary >> >> stuff has, is to have the xml format part more self-documenting and >> >> stream format more readale? And like eventually one could think >> >> there >> >> were tooling and visualisations or whatnot to support whatever >> >> readable >> >> and parsable formats if enough stuff is in the xml sources. >> >> >> >> so tldr; just pick whatever greppable stuff for apertium strem >> >> format. >> >> >> >> [0] <http://black.bikeshed.com/> >> >> [1] <https://wiki.apertium.org/wiki/List_of_symbols> >> >> >> >> -- >> >> Regards, Flammie <https://flammie.github.io> >> >> (Please note, that I will often include my replies inline instead of >> >> top or bottom of the mail) >> >> _______________________________________________ >> >> Apertium-stuff mailing list >> >> Apertium-stuff@lists.sourceforge.net >> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > _______________________________________________ >> > Apertium-stuff mailing list >> > Apertium-stuff@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> >> There is already >> https://github.com/apertium/streamparser >> >> for Python... >> >> Fran >> >> >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > > > -- > < Xavi Ivars > > < http://xavi.ivars.me > > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff