Re: pluralization idea that keeps bugging me
On Sun, Feb 10, 2008 at 12:56:14PM +0300, Richard Hainsworth wrote: > How - in sketch form - would I go about creating a module to do what I > suggest? I am not suggesting someone writes a module I have suggested, > but the barebones steps to creating a new metacharacter. > > I have written infix multidispatch functions in pugs with Unicode > characters to investigate that part of the language. But I dont quite > see how to go about creating a new interpolatable metacharacter. The category is called qq_backslash, so you'd define a new qq_backslash:s or whatever. I don't think pugs allows new ones to be defined, though. -ryan
Re: pluralization idea that keeps bugging me
Brandon S. Allbery KF8NH wrote: On Feb 9, 2008, at 11:43 , Richard Hainsworth wrote: I posted an idea about pluralisation could be handled in a way that would not be English-centric (Subject: interpolation contextualisation). There were no responses to the idea. Was it so bad? Did no one see it? Was it too un-perlish? Was the title too horrible? I saw one response, noting that you can define in a module your own quote characters with their own interpolation rules; no core changes needed. Why then did Larry ask the original question? Why also did others with far better knowledge than I indicate that hooks should be present in interpolation to make language-dependent modules possible, thus indicating the hooks might not be there? How - in sketch form - would I go about creating a module to do what I suggest? I am not suggesting someone writes a module I have suggested, but the barebones steps to creating a new metacharacter. I have written infix multidispatch functions in pugs with Unicode characters to investigate that part of the language. But I dont quite see how to go about creating a new interpolatable metacharacter.
Re: pluralization idea that keeps bugging me
On Feb 9, 2008, at 11:43 , Richard Hainsworth wrote: I posted an idea about pluralisation could be handled in a way that would not be English-centric (Subject: interpolation contextualisation). There were no responses to the idea. Was it so bad? Did no one see it? Was it too un-perlish? Was the title too horrible? I saw one response, noting that you can define in a module your own quote characters with their own interpolation rules; no core changes needed. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED] system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED] electrical and computer engineering, carnegie mellon universityKF8NH
Re: pluralization idea that keeps bugging me
Hi, Warnocked! Indeed :) I posted an idea about pluralisation could be handled in a way that would not be English-centric (Subject: interpolation contextualisation). There were no responses to the idea. Was it so bad? Did no one see it? Was it too un-perlish? Was the title too horrible? The basic idea would be to add hooks into interpolation to allow for context suppliers and context sensors. The context sensors change words depending on data supplied through context suppliers. Note that even in English, if you change a noun from singular to plural, you need to change the verb from singular form to plural form. First of all, I think a module like this should be either perfect or not exist at all: you won't use it after it makes the first mistake, or when you cannot use it everywhere. Now, to have a perfect module you need some pretty smart people to create the base lib (dealing with natural languages is not a piece of cake). Then you need a bunch of other people who understand what's going on to to create and test the different language versions. I fear that at the end you end up with a huge codebase, created by various people, parts of which get out-of-sync or become unmaintained, and which generally consumes a lot of memory (think about e.g. dictionaries for irregular words - take a look at Lingua::EN::Inflect, for example) when used, and also slows down execution. All this to save one not-very-often-used "if... else" block. If we really want to help people type less, why not just rename "else" to "e" ? :) It also seems to me that I will need a module like this when my computer does not only *ask* where I want to go today, but also *cares*. ;) So IMHO while it's a nice idea, it's just an overkill. (And it's definitely not about Perl6 as a language.) - Fagzal
Re: pluralization idea that keeps bugging me
Warnocked! I posted an idea about pluralisation could be handled in a way that would not be English-centric (Subject: interpolation contextualisation). There were no responses to the idea. Was it so bad? Did no one see it? Was it too un-perlish? Was the title too horrible? The basic idea would be to add hooks into interpolation to allow for context suppliers and context sensors. The context sensors change words depending on data supplied through context suppliers. Note that even in English, if you change a noun from singular to plural, you need to change the verb from singular form to plural form. Larry Wall wrote: Last night I got a message entitled: "yum: 1 Updates Available". Of course, that's probably just a Python programmer giving up on doing the right thing, but we see this sort of bletcherousness all the time. After a recent exchange on PerlMonks about join, I've been thinking about the problem of pluralization in interpolated strings, where we get things like: say "Received $m message{ 1==$m ?? '' !! 's' }." My first thought is that this is such a common idiom that we ought to have some syntactic sugar for it: say "Received $m message\s." which reads nicely enough since the usual case is plural. Basically, \s would be smart enough to magically know somehow whether the last interpolation was 1 or not. It would be particular nice when the interpolation is a closure: say "Received {calculate_number_of_messages()} message\s." That would cover most of the cases for English speakers using regular nouns, but I wonder whether there's some kind of generalization that would help for cases like: say "There was/were $o ox/oxen" But that doesn't work since / isn't a metacharacter. Using an adverb seems like overkill, if we can piggyback on an existing metachar. Maybe something like say "There was\swere $o ox\soxen" where if anything alphabetic follows the \s it is the alternative plural. But note that the first \s there would have to be looking forward rather than backward to do the verb, which constrains the possible mechanisms, and makes it problematic to use \s multiple times: say "There was\swere $o ox\soxen and $g goat\s." though that could be made clearer with explicit concatenation: say "There was\swere $o ox\soxen " ~ "and $g goat\s." say "There was\swere $o ox\soxen ", "and $g goat\s." Or maybe instead of using \ we should use a sigil: say "There $ $o $" except, of course, that $<> is already taken. Seems tacky to use up a real variable name like: say "There $X $o $X" I suppose one could make a case for Num vars having a .<> method though: say "There $o $o $o" That nicely resolves the ambiguity of say "There $o $o $o and $g goat$g" but doesn't really help when you really need it, which is when you interpolate something hairy: say "There $j.k.l.m.o $j.k.l.m.o $j.k.l.m.o and $j.k.l.m.g goat$j.k.l.m.g" It's even less helpful when you interpolate a closure since there's no variable name to refer to (unless you assign one, but then we're losing much of our syntactic sugary wonderfulness). So maybe we should just make \s dwim and leave it at that. Two dwimminesses, really. The first dwim finds the associated interpolation, either the first interpolation of a variable or closure before the \s, or if there is none, the first one after. Call that interpolated value $X for the moment. (It doesn't really have to have a real variable name, but the important thing is not to evaluate the expression multiple times since it might have side effects (including the side effect of being inefficient to compute).) The second dwim looks at the alphabeticality of the next character (defined Unicodically, of course) to decide if there is one argument or two: foo\s means $X == 1 ?? 'foo' !! 'foos' foo\sbarmeans $X == 1 ?? 'foo' !! 'bar' Internally, you end up multiply dispatching to something like pluralize($X,'foo') or pluralize($X,'foo','bar'). (Arguably we could make pluralize interpolate the $X as well, but that only works for noun agreement, not verb agreement.) I think that probably handles most of the Indo-European cases, and anything more complicated can revert to explicit code. (Or go though a localization dictionary...) Any other cute ideas? Larry
Re: pluralization idea that keeps bugging me
On 2008-Jan-31, at 2:38 am, Mark Overmeer wrote: * David Green ([EMAIL PROTECTED]) [080131 08:48]: I've always wanted a magic-S (and I don't think the anglocentrism matters In "the good old days" all computer OSes were anglo-centric. They are not like that anymore. But Perl still is. Well, they provide ways to localise text, which is good; and a lot of applications take advantage of it, which is better. Most programming languages themselves are still English though, or at least their vocabulary is based on English and English-like words. (Except for the ones that aren't.) Fortunately, since Perl6 is ultimately mutable, it should be reasonably straightforward to translate it all so that you could start programs with "use Dutch" or "use Japanese". use Lingua::FR; mes @valeurs=(1,2,3); dis $_ pour @valeurs;#pardon my French Of course, while some languages might need only to translate all the function and variable names, others would arguably want to rearrange the grammar and syntax too, so "reasonably straightforward" is a relative term The brute force way would be to redefine every function with a new name for the new language; but perhaps what we really want is a more elegant way to do that: sub foo :trans(fr=>"fue", de=>"fu") {...} I do not think that your use Locale::Lingua::Romana::Perligata; is usable, because the translation (in general) adapts to the language of the user of the module, not the likings of the author. That depends on the circumstances; the author(s) still have to provide translations in the first place. I wasn't thinking of ways to handle multiple languages, just ways to let the author more easily use his own language (which is what the majority of perl programs do, since most of them are written for private use). As I suggested in a previous mail, we can do it by making say/print a bit smarter. Instead of interpolating before they are called, we let them interpolate themselves, and optionally translate. I really like the idea of having text lazily interpolate/translate. And in P6, it will be possible to override quoting and concatenating so that the code doesn't even have to look any different. Being able to refer to the interpolated values is a bit of a different matter, I think; although I guess the main reason for wanting to do so is translating (including singular-to-plural "translations"). Flexible interpolation is good because it makes text look more natural, as opposed to a printf-like separation of parts. On the other hand, the more natural the text looks in one language, the more work it can be to translate it automatically. The magic-S is more of a shortcut when you're not doing "real" translations at all. -David
Re: pluralization idea that keeps bugging me
* David Green ([EMAIL PROTECTED]) [080131 08:48]: > I've always wanted a magic-S (and I don't think the anglocentrism > matters, because Perl is already pretty anglocentric -- more so than > plural S's, which apply to some other languages anyway). In "the good old days" all computer OSes were anglo-centric. They are not like that anymore. But Perl still is. > use Locale::Lingua::EN; > say "There was\s {3} ox\s";# There were 3 oxen > > use Locale::Lingua::Romana::Perligata; > say "{3} bos\s erat\s";# 3 boves erant > > Although calling it "\s" loses its impact in other languages But > I think the underlying idea to seize on is a way to grab interpolated > values so that there's a nice way to do tricks like that. Preferably > in a way that doesn't look symmetrical so you can "point" it before or > behind. As I suggested in a previous mail, we can do it by making say/print a bit smarter. Instead of interpolating before they are called, we let them interpolate themselves, and optionally translate. > say "I've got $bid dollar\s, do I hear {$< + 1}?" pre-parse standard call to (s)print(f)/say from say "I've got $bid dollar, do I hear ", $bid+1, "?" into print "I've got \Ibid\E dollar, do I hear \I__ANON1__\E?\n", bid => $bid, __ANON1 => $bid+1, __LINE => __LINE__; (introducting \I \E as interpolation indicators) Isn't the (usual, existing) translation syntax a lot simpler than you suggest? (the rewrite will take place as first step within the print/say. Translations must be implemented in the output layers, because only there we know enough about character-set and end-user. Within the program, you do not want to be bothered with translated strings) The default interpolation implementation for print() can be very simple. However, now we can also make translation modules which use external tables or databases to do the optional intelligent work. I do not think that your use Locale::Lingua::Romana::Perligata; is usable, because the translation (in general) adapts to the language of the user of the module, not the likings of the author. A more general use is: setlocale('lat') open OUT, ">:language('lat'):encoding('latin1')", $f -- MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net
Re: pluralization idea that keeps bugging me
On 2008-Jan-26, at 9:58 am, Larry Wall wrote: My first thought is that this is such a common idiom that we ought to have some syntactic sugar for it: say "Received $m message\s." I've always wanted a magic-S (and I don't think the anglocentrism matters, because Perl is already pretty anglocentric -- more so than plural S's, which apply to some other languages anyway). Rather than extra syntax to specify alternatives, I wonder about having \s work with arrays (which also provides a way to deal with duals, for example): say "Received $_ {}\s" for 1, 2, 77; "Received 1 ox" "Received 2 oxen" "Received 77 oxes" It might even be sophisticated enough to guess whether it should add "es" or just "s", but anything beyond that probably belongs in a module. use Locale::Lingua::EN; say "There was\s {3} ox\s";# There were 3 oxen use Locale::Lingua::Romana::Perligata; say "{3} bos\s erat\s";# 3 boves erant Although calling it "\s" loses its impact in other languages But I think the underlying idea to seize on is a way to grab interpolated values so that there's a nice way to do tricks like that. Preferably in a way that doesn't look symmetrical so you can "point" it before or behind. say "I've got $bid dollar\s, do I hear {$< + 1}?" Or using "$<" instead of "\s": say "I'm bid $d dollar$< for @this[]$> $o @ox[]$<" ...except that I'm not crazy about calling it "$>". (If that would even work.) But something like that. Perhaps strings should build an array of their interpolations? say "$a $b $c, this string contains [EMAIL PROTECTED] interpolations" (Then again, maybe there's a time to break down and use (s)printf.) -David
Re: pluralization idea that keeps bugging me
On 26 Jan., 17:58, [EMAIL PROTECTED] (Larry Wall) wrote: > Last night I got a message entitled: "yum: 1 Updates Available". > Of course, that's probably just a Python programmer giving up on doing > the right thing, but we see this sort of bletcherousness all the time. > > After a recent exchange on PerlMonks about join, I've been thinking > about the problem of pluralization in interpolated strings, where we > get things like: > > say "Received $m message{ 1==$m ?? '' !! 's' }." > > Any other cute ideas? When you are a MUD[*]-developer you have to deal with things like this all the time. Where I was we did it like this (german sentence converted to perlish syntax) say "{der($player)} nimmt {den($item)} aus {dem($container)}."; which means: $player takes $item out of $container where $player, $item and $container are objects or hashes and der(), den(), dem() are functions which convert the given object into the definite nominative, accusative and dative. There are more functions to implement indefinitive cases and other grammatical things. Objects/hashes contain the number, adjectives etc. To make that more english that could look like: say "{nominative($player)} takes {accusative($item)} out of {dative($container)}."; With $player={name=>"Paul", adjective=>"great", gender=>"male"}, $item={name=>"ball", count=>3, gender=>"male"}, $container=>{name=>"box", gender=>"male"}, it would interpolate into The great Paul take the 3 balls out of the box. Maybe... btw: in german the gender of the objects also changes things... The orignal example > say "Received $m message{ 1==$m ?? '' !! 's' }." could then look like: say "Recieved {nominative({name=>'message',count=>$m})}." Maybe someone could find a more concise form if huffmanly desireable. Regards, Ron
Re: pluralization idea that keeps bugging me
Perl - when I first met it - was great because it handled text easily and 'naturally'. I now use perl for everything, even when another language would probably be better. Perl6 has gone a long way to making things more universal by using UNICODE, (The difficulties of non-Latin fonts and coding are horrendous). Mark and chromatic are right that an ability to manipulate multiple languages "naturally" and "in core" would be something no other programming language does. Perl6 seems to handle most of the necessary things, but not all - I think. Hence Larry's original question. There are - it seems to me - several different aspects to consider. My breakdown would be: a) having the language constructs that make text interpolation easy - that is the *text* morphs itself to adjust to the context brought in by the interpolated data. What is necessary is not a plurals fix for English, but a mechanism for fixing that can be applied to other languages. (Here I think perl6 grammars will help, but I am not sure, and without proof of concept actually doubt the facility exists in perl). b) Translating the perl core itself - the use of other languages to write code in. Given perl6 grammar, and given that any programming language is a rigidly circumscribed subset of words, I think this is entirely possible in most natural languages. Clearly for the compiler to work, an non-English coding language must uniquely map to and from an equivalent English coding. c) Having the mechanisms in perl6 core not just to interpolate text contextually, but also for different texts to be used with the same interpolations (when a text is translated, different sentence structures result). As Mark pointed out, this can be accomplished with Templates. d) Ensuring that different information streams can each be directed through templates. As Mark pointed out, more is needed than standard input, output, and errors. Moreover, it would be fantastic if the output from the perl6 compiler could be constructed so that its information streams (warnings, errors, etc) could be attached to translation filters. I think item (a) is not quite there in perl6. But I really want to use perl6 and I hope this line of development does not derail the fantastic amount of momentum we have seen in recent months. Mark Overmeer wrote: * Larry Wall ([EMAIL PROTECTED]) [080126 16:58]: Last night I got a message entitled: "yum: 1 Updates Available". After a recent exchange on PerlMonks about join, I've been thinking about the problem of pluralization in interpolated strings, where we get things like: say "Received $m message{ 1==$m ?? '' !! 's' }." Any other cute ideas? I totally agree with many responses, that special support for the English language is not preferred, certainly when it bothers developers for other natural languages. Imagin that you wrote your code this way for To syslog in English (what I understand), and to the website in Chinese (what I do not understand) Of course, there are quite some more features in the module. Concluding: - hopefully, there is a way to simplify the work for all of us who do need to support many languages within one application - create one standard, so all CPAN modules integrate in the same way - let's try to get Perl to handle languages!
Re: pluralization idea that keeps bugging me
* Larry Wall ([EMAIL PROTECTED]) [080126 16:58]: > Last night I got a message entitled: "yum: 1 Updates Available". > After a recent exchange on PerlMonks about join, I've been thinking > about the problem of pluralization in interpolated strings, where we > get things like: > > say "Received $m message{ 1==$m ?? '' !! 's' }." > > Any other cute ideas? I totally agree with many responses, that special support for the English language is not preferred, certainly when it bothers developers for other natural languages. Imagin that you wrote your code this way for a website, and then your boss (always blame your boss) decides that the site must be ported to Chinese for expansion... It would be nice if Perl joined nearly all other Open Source applications, in being multi-languaged. During the lightningtalks of last YAPC::EU, I called for localization of error messages in Perl 5.12, but Perl6 improvements are welcomed as well. My idea: Recently, I released Log::Report, which is a new translation framework. It combines exception-handling with report dispatch and translations. What's new: some module produces a text, but that module was found on CPAN. Only the author of the main program knows how to handle the text. So, delay translations until an output layer is reached. Locale::TextDomain and gettext translate immediately, as does $! They translate on the location where the report emerges. Log::Dispatch and Log::Log4perl cannot influence the text production process. What my new Log::Report does, is delaying translations to the moment it reaches the dispatcher. Like this: package main; dispatcher SYSLOG => 'syslog', language => 'en-US', charset => 'ascii', facility => 'local4'; dispatcher STDOUT => 'website', language => 'cn', charset => 'utf8'; run_some_code(); # text both to syslog and stdout package Someone::Elses::Package; use Log::Report 'translation-table-namespace'; sub run_some_code() { # Locale::TextDomain compatible syntax, info ~ print info __nx"Received {m} messages", $m, m => $m; } To syslog in English (what I understand), and to the website in Chinese (what I do not understand) Of course, there are quite some more features in the module. The translation tables can have gettext syntax, database driven, or maybe a module with Perl routines from complex languages. (Only the first is implemented on the moment, but the framework is present). The provided try() is also implemented as dispatcher, which collects the messages from the block, and has not yet been translated: try { error __"help!" }; if($@) # an Log::Report::Dispatcher::Try object { my $exception = [EMAIL PROTECTED]>wasFatal; $exception->throw # re-cast if $ex->message !~ m/help/; # ignore call for help } When someone starts coding, it is more and more uncertain in which languages it will be used later. So, it would be nice to help people to avoid mistakes which may block an easy conversion. For instance, best if texts are produced in as large blocks as possible, outside the program file. We know how to do that: a template system. Templates themselves are easily translatable. About a zillion or two CPAN modules implement a Locale::TextDomain-like HASH-based substitution system in templates. Translations are impossible for syntaxes like this: print "Received $m messages" because the $m is already filled-in before print is called. For this reason, a lookup in the translation table is impossible. It would be nice to not translate above string into print 'Received '.$m.' messages' but report info => 'Received {m} messages', m => $m, linenr => __LINE__, ..etc.. (of course, some \Q\E like meta-syntax, not {}) Print() works internally more like printf(). No problem. Without translation tables defined, it just takes what it got as first argument. In the infrastructure, we need a reason for each message, like syslog levels. Print, warn, and die have implied reasons (resp info, warn and error). Everyone is tricking trace and verbose levels, so we need a few more useful levels. Concluding: - hopefully, there is a way to simplify the work for all of us who do need to support many languages within one application - create one standard, so all CPAN modules integrate in the same way - let's try to get Perl to handle languages! -- Regards, MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net
Re: pluralization idea that keeps bugging me
Larry Wall wrote: > Last night I got a message entitled: "yum: 1 Updates Available". > Of course, that's probably just a Python programmer giving up on doing > the right thing, but we see this sort of bletcherousness all the time. > > After a recent exchange on PerlMonks about join, I've been thinking > about the problem of pluralization in interpolated strings, where we > get things like: > > say "Received $m message{ 1==$m ?? '' !! 's' }." > > My first thought is that this is such a common idiom that we ought > to have some syntactic sugar for it: > > say "Received $m message\s." > > which reads nicely enough since the usual case is plural. > Basically, \s would be smart enough to magically know somehow whether > the last interpolation was 1 or not. It would be particular nice when > the interpolation is a closure: > > say "Received {calculate_number_of_messages()} message\s." I think the most general solution is a nice quoting construct. So if you say say qq:l10n(en)"Received $m message\s"; the quote handler in l10n:en (or whatever) receives a list of pairs of strings and variables to interpolate, ['$m' => $m, '\s' => undef]. It can then decide what to do with it. Wait, that smells like macros, which are already specced - so never mind ;-) Moritz -- Moritz Lenz http://moritz.faui2k3.org/ | http://perl-6.de/ signature.asc Description: OpenPGP digital signature
Re: pluralization idea that keeps bugging me
On Sat, Jan 26, 2008 at 18:43:50 -0800, Jonathan Lang wrote: > Right. One last question: is this (i.e., extending a string's > grammar) a "keep simple things simple" thing, or a "keep difficult > things doable" thing? I'm going to guess somewhere in between. It should be about the same level of complexity as Filter::Simple, except with much finer control and more correctness. I'm not the best person to answer this though. -- Yuval Kogman <[EMAIL PROTECTED]> http://nothingmuch.woobling.org 0xEBD27418 pgpEFwVt1e30B.pgp Description: PGP signature
Re: pluralization idea that keeps bugging me
Yuval Kogman wrote: > You can subclass the grammar and change everything. > > Theoretically that's a "yes" =) Right. One last question: is this (i.e., extending a string's grammar) a "keep simple things simple" thing, or a "keep difficult things doable" thing? -- Jonathan "Dataweaver" Lang
Re: pluralization idea that keeps bugging me
On Sat, Jan 26, 2008 at 18:12:17 -0800, Jonathan Lang wrote: > This _does_ appear to be something more suitable for a Locale:: > module. I just wonder if there are enough hooks in the core to allow > for an appropriately brief syntax to be introduced in a module: can > one roll one's own string "interpolations" as things stand? E.g., is > there a way to add meaning to backslashed characters in a string that > would normally lack meaning? You can subclass the grammar and change everything. Theoretically that's a "yes" =) -- Yuval Kogman <[EMAIL PROTECTED]> http://nothingmuch.woobling.org 0xEBD27418 pgpSWUL6t3dFZ.pgp Description: PGP signature
Re: pluralization idea that keeps bugging me
On Sat, Jan 26, 2008 at 08:58:43AM -0800, Larry Wall wrote: > Last night I got a message entitled: "yum: 1 Updates Available". > Of course, that's probably just a Python programmer giving up on doing > the right thing, but we see this sort of bletcherousness all the time. > > Any other cute ideas? > It's worth reading the perldoc for Locale::Maketext and Locale::Maketext::TPJ13. Sean Burke did some truly excellent work explain a lot of the pitfalls here. Sean built us the only solution I've yet seen that gets pluralization reasonably ok in languages with non-English-like pluralization rules without making me want to just give up and write "Updates found: 1" ;) -j
Re: pluralization idea that keeps bugging me
Gianni Ceccarelli wrote: > Please don't put this in the language. The problem is harder than it > seems (there are European languages that pluralize differently on $X % > 10, IIRC; 0 is singular or plural depending on the language, etc etc). -snip- > I know Perl is not "minimal", but sometimes I feel that it will end up > being "maximal"... and the more you put "in the core", the less > flexibility you get in the long term. This _does_ appear to be something more suitable for a Locale:: module. I just wonder if there are enough hooks in the core to allow for an appropriately brief syntax to be introduced in a module: can one roll one's own string "interpolations" as things stand? E.g., is there a way to add meaning to backslashed characters in a string that would normally lack meaning? Do we have the tools to build "$m tool\s"? -- Jonathan "Dataweaver" Lang
Re: pluralization idea that keeps bugging me
On Sat, Jan 26, 2008 at 02:36:32PM -0800, chromatic wrote: > > Nearly pain-free l10n and i18n *is* kind of a killer feature though. +1 > -- c > --
Re: pluralization idea that keeps bugging me
Its only English centric if the idea is fixed to plurals, because its only for plurals where English words are mutated by grammar rules. In other languages, words are mutated by other factors, such as the gender of the word, the case, and the number. The problem can be quite difficult, say in Russian. Suppose you want to say something like "Respected " and interpolate name> from a database. In English, its a doddle. But in Russian, all adjectives (eg. 'respected') have both male and female forms, so the gender of customer has to be determined in order to correctly interpolate. And for plurals, some languages have different words for single, double and many forms. In Russian, the noun after the number has one form for 1 (nominative singular), another form (genitive singular) for numbers 2 to 4, and then a third form (genitive plural) for 5 and above. So, a simple plural hook is insufficient. Then take Welsh, its words mutate with prefixes as well as suffixes dependent on context. Whilst it would be nice for there to be a neat syntax for such things (thus avoiding English-centricity), the complexities of all languages might be too burdensome for core perl6. Amir E. Aharoni wrote: On 26/01/2008, Larry Wall <[EMAIL PROTECTED]> wrote: After a recent exchange on PerlMonks about join, I've been thinking about the problem of pluralization in interpolated strings, where we get things like: say "Received $m message{ 1==$m ?? '' !! 's' }." ... Any other cute ideas? No matter what you do it will remain too English-centric. It might work for Catalan, too. But it will remain totally useless for Arabic or Chinese. In any case, i don't understand why should this be in the core language at all.
Re: pluralization idea that keeps bugging me
On 2008-01-26 Larry Wall <[EMAIL PROTECTED]> wrote: > Last night I got a message entitled: "yum: 1 Updates Available". > [snip a lot] > I think that probably handles most of the Indo-European cases, and > anything more complicated can revert to explicit code. (Or go though > a localization dictionary...) Please don't put this in the language. The problem is harder than it seems (there are European languages that pluralize differently on $X % 10, IIRC; 0 is singular or plural depending on the language, etc etc). Look at the documentation of GNU gettext, or the translation guidelines for KDE, to get the whole mess. We already have Locale::MakeText. To get the whole "magical interpolation", we'd just have to define a suitable quoting construct, right? I know Perl is not "minimal", but sometimes I feel that it will end up being "maximal"... and the more you put "in the core", the less flexibility you get in the long term. -- Dakkar - GPG public key fingerprint = A071 E618 DD2C 5901 9574 6FE2 40EA 9883 7519 3F88 key id = 0x75193F88 printk("%s: Boo!\n", dev->name); linux-2.6.19/drivers/net/depca.c signature.asc Description: PGP signature
Re: pluralization idea that keeps bugging me
On Sat, Jan 26, 2008 at 08:58:43AM -0800, Larry Wall wrote: > After a recent exchange on PerlMonks about join, I've been thinking > about the problem of pluralization in interpolated strings, where we > get things like: > > say "Received $m message{ 1==$m ?? '' !! 's' }." > > My first thought is that this is such a common idiom that we ought > to have some syntactic sugar for it: > > say "Received $m message\s." > > [...] > > Any other cute ideas? FWIW, this sounds to me a lot like a special quoting operator or adverbial form. say qq:pluralized "Received $m message\s". Pm
Re: pluralization idea that keeps bugging me
At 8:58 AM -0800 1/26/08, Larry Wall wrote: My first thought is that this is such a common idiom that we ought to have some syntactic sugar for it: say "Received $m message\s." I don't think that a feature like this should be in the core language; it is too complicated as well as an open-ended problem. A better use of this discussion is perhaps to determine whether any more basic core features would need updating in order to support a separate extension module to more easily provide the feature that was being discussed. -- Darren Duncan
Re: pluralization idea that keeps bugging me
On Saturday 26 January 2008 08:58:43 Larry Wall wrote: > That would cover most of the cases for English speakers using regular > nouns, but I wonder whether there's some kind of generalization that > would help for cases like: > > say "There was/were $o ox/oxen" That makes me wish for a subjunctive/optative mood marker. I'm not sure why. In-language localization and internationalization hooks do seem awfully useful, but English-only pluralization rules just might not cut it. Nearly pain-free l10n and i18n *is* kind of a killer feature though. -- c
Re: pluralization idea that keeps bugging me
To me this sounds like use Lingua::EN::Pluralize::DSL; which would overload your grammar locally to parse strings this way. However, due to i18n reasons this should not be in the core. It might make sense to ship a slightly modernized Locale::MakeText with Perl 6 so that it can be used in the compiler itself, but unless a fully open ended system like L::MT is included I think having anything at all might be damaging, because this will encourage people to use the partial solution that is already built in instead of the complete on eon the CPAN (c.f. many core modules). -- Yuval Kogman <[EMAIL PROTECTED]> http://nothingmuch.woobling.org 0xEBD27418
Re: pluralization idea that keeps bugging me
Amir E. Aharoni wrote: On 26/01/2008, Larry Wall <[EMAIL PROTECTED]> wrote: After a recent exchange on PerlMonks about join, I've been thinking about the problem of pluralization in interpolated strings, where we get things like: say "Received $m message{ 1==$m ?? '' !! 's' }." ... Any other cute ideas? No matter what you do it will remain too English-centric. It might work for Catalan, too. But it will remain totally useless for Arabic or Chinese. In any case, i don't understand why should this be in the core language at all. I second that. A few more thoughts: 1. For example in Hungarian, you don't need this at all: the noun stays singular after the numeral. 2. AFAIK in some languages it's not "1 ore more", but "1, 2 or more". 3. It's often not "1 or more" what you need, but "none, 1 ore more". "No new messages - You have 1 new message - You have 3 new messages." Or more likely "Now new messages. - You have 1 new message. ..." etc. 4. I work a lot with multilingual websites. I have learned long ago that it's never "{{you_have}} [% messages %] {{messages}}." You have to be *very* lucky just to make this work in two languages. Instead, it's "{{number_of_new_messages}}: [% messages %]." That pretty much works everywhere. So not in the core, probably. There are too many exceptions. A module would be cool, though :) String::Plural::English, or whatnot. - Fagzal
Re: pluralization idea that keeps bugging me
"Jonathan Lang" schreef: > I'm not fond of the 'ox\soxen' idea; but I could get behind something > like '\s' or 'ox\s'. "$n ox\s< en>" "$n\s cat\s< s > fight\s< s s>" ;) -- Affijn, Ruud "Gewoon is een tijger."
Re: pluralization idea that keeps bugging me
Jonathan makes an excellent point about s and S. In fact, there's probably a "little language" out there for this. I don't think it needs to be in the core, though. But you could put in some kind of "hook" mechanism, so that detecting the presence of \s or whatever caused the string to be treated specially. Perhaps it gets a different, possibly more sophisticated, type? A type that is only in-core in a limited (English-only?) implementation, but which admins can install at whim. =Austin Jonathan Lang wrote: Larry Wall wrote: Any other cute ideas? If you have '\s', you'll also want '\S': "$n cat\s fight\S" # 1 cat fights; 2 cats fight I'm not fond of the 'ox\soxen' idea; but I could get behind something like '\s' or 'ox\s'. '\s' would mean 'a is singular; b is plural' '\s' would be short for '\s< a>' '\s' would be short for '\s< s>' \S' would reverse this. Sometimes, you won't want the pluralization variable in the string itself, or you won't know which one to use. You could use an adverb for this: :s<$n>"the cat\s \s fighting." and/or find a way to tag a variable in the string: "$owner's \s=$count cat\s" '\s=$count' means "set plurality based on $count, and display $count normally."
Re: pluralization idea that keeps bugging me
Larry Wall wrote: > Any other cute ideas? If you have '\s', you'll also want '\S': "$n cat\s fight\S" # 1 cat fights; 2 cats fight I'm not fond of the 'ox\soxen' idea; but I could get behind something like '\s' or 'ox\s'. '\s' would mean 'a is singular; b is plural' '\s' would be short for '\s< a>' '\s' would be short for '\s< s>' \S' would reverse this. Sometimes, you won't want the pluralization variable in the string itself, or you won't know which one to use. You could use an adverb for this: :s<$n>"the cat\s \s fighting." and/or find a way to tag a variable in the string: "$owner's \s=$count cat\s" '\s=$count' means "set plurality based on $count, and display $count normally." -- Jonathan "Dataweaver" Lang
Re: pluralization idea that keeps bugging me
On 26/01/2008, Larry Wall <[EMAIL PROTECTED]> wrote: > After a recent exchange on PerlMonks about join, I've been thinking > about the problem of pluralization in interpolated strings, where we > get things like: > > say "Received $m message{ 1==$m ?? '' !! 's' }." > > ... > > Any other cute ideas? No matter what you do it will remain too English-centric. It might work for Catalan, too. But it will remain totally useless for Arabic or Chinese. In any case, i don't understand why should this be in the core language at all. -- Amir Elisha Aharoni English - http://aharoni.wordpress.com Hebrew - http://haharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore