Re: [tex-hyphen] Newest GitHub additions into CTAN?

2020-12-30 Thread Arthur Reutenauer
Hi Stojan, Thank you for your reply. It does sound like T2A is the best choice for Macedonian, Mojca will try to make an upload to CTAN by tomorrow (Thursday) evening, otherwise we’ll work on it some time next week. Best, Arthur

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-07-19 Thread Arthur Reutenauer
On Thu, Jul 16, 2020 at 02:41:26PM +0300, Teemu Likonen wrote: > Is this hyph-utf8 and its new Finnish hyphenation patterns publicly > available somewhere? Sorry, Teemu, as I just wrote to Stojan I wanted to do it this week but didn’t have time. I’ll try to make the release next week but can’t

Re: [tex-hyphen] tex-hyphen Digest, Vol 104, Issue 2

2020-07-19 Thread Arthur Reutenauer
On Fri, Jul 17, 2020 at 05:09:02PM +0100, Stojan Trajanovski wrote: > Any update on this? Apologize for bothering. Sorry, I wanted to do it this week but needed to shift my priorities, and next week is going to be even worse. I’ll try and squeeze it in, though. Best,

Re: [tex-hyphen] Enabling Macedonian hyphen for 8-bit engines

2020-07-09 Thread Arthur Reutenauer
Hi Stojan, On Wed, Jul 08, 2020 at 07:32:27AM +0100, Stojan Trajanovski wrote: > I am fine with both ways. Any plans on when we can move this to CTAN? I’ll do it some time next week. Best, Arthur

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-06-16 Thread Arthur Reutenauer
On Tue, Jun 16, 2020 at 09:32:42AM +0300, Teemu Likonen wrote: > I think “school” is good name for that. It clearly tells Finnish people > what to expect. One-consonant rule is one of the ideas behind it but a > name derived from that can be misleading because there are a lot of > hyphenation

Re: [tex-hyphen] Hyphenation in Albanian

2020-06-16 Thread Arthur Reutenauer
Dear Claudio, On Mon, Jun 15, 2020 at 11:57:33PM +0200, Claudio Beccari wrote: > I can certainly ask the student to allow distributing her thesis, but I > believe it will not be of great utility, because, as I said, the thesis is > in Italian, with very few stretches in Albanian, where

Re: [tex-hyphen] Hyphenation in Albanian

2020-06-16 Thread Arthur Reutenauer
Joan, I just created https://github.com/hyphenation/albanian as an empty repository and will add you as a contributor. Is your GitHub user name iGianni? Arthur

Re: [tex-hyphen] Hyphenation in Albanian

2020-06-15 Thread Arthur Reutenauer
Hi Claudio, On Sun, Jun 14, 2020 at 12:05:19AM +0200, Claudio Beccari wrote: > Recently I assisted an Albanian student getting her degree in Italy, who > wrote her thesi in Italian, bu with many stretches of text in Albanian; > these parts where hyphenated by hand, because she could not

Re: [tex-hyphen] Fwd: Hyphenation in Albanian

2020-06-15 Thread Arthur Reutenauer
Dear Joan, You’ve come to the right place :-) I suggest you subscribe to this list (https://tug.org/mailman/listinfo/tex-hyphen), as you’ve already received two replies that were, however, not sent to your personal address. I can help you make hyphenation patterns for Albanian and

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-06-15 Thread Arthur Reutenauer
On Wed, May 20, 2020 at 04:08:04PM +0300, Teemu Likonen wrote: > Arthur Reutenauer [2020-05-18T22:51:44+02] wrote: > >> I wanted to reply earlier but was busy with other things. Sorry >> about that, and also for announcing that there will be more delay >> because I’m s

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-05-18 Thread Arthur Reutenauer
Hi Teemu, On Sat, Apr 18, 2020 at 10:37:55AM +0300, Teemu Likonen wrote: > I thought about that and I still like "basic" more. There are other > words with similar meaning but I would like you to consider the > hierarchy of these concepts. I will repeat some familiar things to help >

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-04-16 Thread Arthur Reutenauer
Hi Teemu, Glad to know that you’re happy with the patterns. On Thu, Apr 16, 2020 at 06:51:57PM +0300, Teemu Likonen wrote: > I have been thinking and testing these new Finnish basic hyphenation > patterns and it looks to me that they are ready. They do what anybody > would expect. The

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-04-05 Thread Arthur Reutenauer
On Sat, Apr 04, 2020 at 03:19:46PM +0200, Arthur Reutenauer wrote: > Thanks for checking! I made several changes since you tested, can you > please try again? The issue with words ending in pairs of vowels that > are not diphthongs should be fixed. Sorry, I just realised that it

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-04-05 Thread Arthur Reutenauer
> You have done very good work in making this easy to test and develop. > Thank you very much. You’re very welcome. > There is “TODO: ‘c’” in makelist file so I did that and pushed to new > branch "experimental-c". Letter “c” is basically “k” or “s” consonant. I > will merge these changes to

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-04-05 Thread Arthur Reutenauer
> I will certainly check but it seems that you have not pushed the new > changes to the public repository. Sorry, done now. Arthur

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-04-04 Thread Arthur Reutenauer
> - K -> OBSTRUENT | LIQUID | SIBILANTS | NASALS | GLIDES | 'v' > + K -> OBSTRUENT | LIQUID | SIBILANTS | NASALS | GLIDES | 'v' | 'w' > > Then added two patterns to handle "sw" and "tw" in the beginning > ("swing", "tweed"): > > - DOTKK -> [...] > + DOTKK -> [...] | '.s2w' |

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-04-04 Thread Arthur Reutenauer
Teemu, Thanks for checking! I made several changes since you tested, can you please try again? The issue with words ending in pairs of vowels that are not diphthongs should be fixed. > Thanks! The hyphenated.txt file looks like the basic Finnish hyphenation > is working as one would

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-04-03 Thread Arthur Reutenauer
Hi Teemu, I should probably have expressed myself differently. You don’t need to know how to program in Lua to use LuaTeX, it’s just an extended version of pdfTeX. The reason why I mentioned it is that it’s much easier to test new patterns because you can add patterns on the fly,

Re: [tex-hyphen] Finnish basic hyphenation rules

2020-04-01 Thread Arthur Reutenauer
Hi all, I was the one who pointed Teemu to this list, since he asked for one (and obviously it’s a better forum than Stack Exchange for this kind of discussion). If you have no interest in Finnish hyphenation, please ignore this thread :-) Teemu, thanks for writing. As you

Re: [tex-hyphen] Problems with hyph-la-x=classic.tex

2020-02-13 Thread Arthur Reutenauer
Dear Claudio, On Thu, Feb 13, 2020 at 10:31:28AM +0100, Claudio Beccari wrote: > Dear friends, apparently the file hyph-la-x-classic.tex file exists but some > engines do not foind it. Thank you for the report. It is strange, and unfortunately I don’t have time to investigate it right

Re: [tex-hyphen] Bug in Classic Latin hyphenation patterns

2019-12-04 Thread Arthur Reutenauer
Hi Doug, On Tue, Dec 03, 2019 at 06:25:57PM -0700, Doug McKenna wrote: > .i2a .i2e .i2i .i2o .i2u .iuu2a .iuu2e .iuu2o% iuuamentum, iuuentus, iuuo > .iei2u .iai2u % iacere, ieiunium, iaiunitas, Ioannes, iudaismus, iuncte, > iunix, Yes, that definitely looks like a mistake. Clearly

Re: [tex-hyphen] Hyphenation patterns for classical Latin

2019-12-04 Thread Arthur Reutenauer
Hello, On Wed, Jul 03, 2019 at 04:50:35PM +0200, Keno Wehr wrote: > For these reasons, new patterns have been developed with the aid of patgen, > based on a list of about 7500 Latin words. I don’t think I’ve ever replied to this email. Sorry about that. Is

Re: [tex-hyphen] Newbie: Question about pattern structure

2019-08-22 Thread Arthur Reutenauer
Hello Nathalie, Interesting subject you chose! The reference and, in my opinion, best complete explanation of TeX’s hyphenation algorithm is appendix H of the TeXbook (https://www.worldcat.org/oclc/826569026). You’ll find there everything you need to get a basic understanding. For a

Re: [tex-hyphen] Patgen

2019-05-21 Thread Arthur Reutenauer
Hi Keno, >> > @!trie_size=110021182; >> > @!triec_size=5506; >> >> To avoid the warning, I had to reduce the latter value to >> @!triec_size=54677566; > > Thank you, I have had no problems with these values. These values were reported to cause linker problems, similar

Re: [tex-hyphen] Patgen

2019-05-15 Thread Arthur Reutenauer
Hi Keno, On Tue, May 14, 2019 at 10:55:32PM +0200, Keno Wehr wrote: > Is it possible to adapt patgen for such huge lists? If you’re able to compile patgen yourself, it should be enough to change trie_size and triec_size in patgen.ch, currently set to 10,000,000 and 5,000,000

Re: [tex-hyphen] fmtutil fails after updating hyph-utf8

2019-04-06 Thread Arthur Reutenauer
Hi Hironobu, > * Engine e(u)ptex is failing at > hyph-utf8/patterns/ptex/hyph-zh-latn-pinyin.ec.tex > ! Non letter. > l.201 ' 1a > ? > > * Engine pdftex is failing at > hyph-utf8/patterns/tex-8bit/pyhyph.tex > ! Non letter. > l.163 1n^^fc > ? Thanks for the report,

Re: [tex-hyphen] Licence for the Coptic patterns

2018-12-11 Thread Arthur Reutenauer
On Mon, Dec 10, 2018 at 11:43:02PM +0100, Claudio Beccari wrote: > Yes, I aggree to put the Coptic patterns under the MIT licence. Thanks, Claudio, as usual. See https://github.com/hyphenation/tex-hyphen/commit/b3671a5 Best, Arthur

Re: [tex-hyphen] extending `hyph-zh-latn-pinyin.tex'

2018-11-24 Thread Arthur Reutenauer
On Sat, Nov 24, 2018 at 10:59:49AM +, Philip Taylor wrote: > i. e., break between any “ng” in the middle of a word. > >Not convinced, Arthur.  Some would write (e.g.,) 北京人 (Běijīngrén) as Ah, you’re right, I knew I should have looked more closely. >

Re: [tex-hyphen] extending `hyph-zh-latn-pinyin.tex'

2018-11-24 Thread Arthur Reutenauer
On Sat, Nov 24, 2018 at 11:02:48AM +0100, Werner LEMBERG wrote: >To be > serious: If patgen produces those patterns, I think they are *really* > necessary. Do you still have the source file somewhere? Best,

Re: [tex-hyphen] extending `hyph-zh-latn-pinyin.tex'

2018-11-24 Thread Arthur Reutenauer
On Fri, Nov 23, 2018 at 11:32:11PM +0100, Mojca Miklavec wrote: > I still find it useful to have a level of abstraction, *in particular* > when the rules are really simple. I agree, and I think they should be expressed in terms of context-free grammar, for example for Turkish > vowels = %w{a â

Re: [tex-hyphen] extending `hyph-zh-latn-pinyin.tex'

2018-11-24 Thread Arthur Reutenauer
On Fri, Nov 23, 2018 at 08:57:21PM +0100, Werner LEMBERG wrote: > Please elaborate. I think the patterns of the form 1nV and 1rV, for V any vowel, are useless because all the breaks they specify are already covered by the V1n and V1r patterns, and that all the 1gV could be replaced by just

Re: [tex-hyphen] extending `hyph-zh-latn-pinyin.tex'

2018-11-22 Thread Arthur Reutenauer
On Thu, Nov 22, 2018 at 09:43:57AM +0100, Werner LEMBERG wrote: > Not necessary – the stuff is so simple, and the number of syllables is > closed which means there won't be any changes except bug fixes. A > simple search and replace did the job in a few minutes; see attached > file. Thanks,

Re: [tex-hyphen] extending `hyph-zh-latn-pinyin.tex'

2018-11-21 Thread Arthur Reutenauer
>> (1) Another file. >> >> This solution I rather dislike. > > This is what I would go for. > > But I would create a simple script in any programming language (lua, > ruby, python, ...) and generate two pattern files out of it. As Mojca said, without the shadow of a doubt. Just use a

Re: [tex-hyphen] [texhax] help

2018-11-20 Thread Arthur Reutenauer
On Mon, Nov 19, 2018 at 08:50:49PM +0100, Werner LEMBERG wrote: > Another possibility for an experienced C++ user would be to convert > opatgen's (GPLed) source code to modern C++, then publishing it on > gitlab or something similar. > > >

Re: [tex-hyphen] [texhax] help

2018-11-19 Thread Arthur Reutenauer
Thanks, Philip, for forwarding this discussion to the TeX-hyphen list. On Mon, Nov 19, 2018 at 12:20:53PM +, Philip Taylor wrote: >In this matter I defer entirely to the TUG hyphenation team (Arthur, >Mojca, ...) who know infinitely more about such things than I ever did or >

Re: [tex-hyphen] hyphenation for Bulgarian language

2018-09-14 Thread Arthur Reutenauer
On Fri, Sep 14, 2018 at 04:27:35PM +0300, Стоян Димитров wrote: > Is it possible to change the author and the license on site? Sure, what should we put? If we could have the MIT licence, that would be very nice :-) Best, Arthur

Re: [tex-hyphen] hyphenation for Bulgarian language

2018-09-14 Thread Arthur Reutenauer
On Fri, Sep 14, 2018 at 03:15:33PM +0200, Arthur Reutenauer wrote: > I sent you an email > about that. I stand corrected, I did *not* send you an email, which I suppose is the reason for your question right now. Sorry

Re: [tex-hyphen] Hyphenation exception files

2018-09-01 Thread Arthur Reutenauer
On Sat, Sep 01, 2018 at 08:03:36PM +0200, Keno Wehr wrote: > What is the policy concerning global hyphenation exception files? When there is a default set of hyphenation exceptions, they’re loaded in the format together with the patterns. That’s currently the case for 21 languages and

Re: [tex-hyphen] hyphenating overview

2018-06-09 Thread Arthur Reutenauer
Barbara, Thanks for checking. > While the failure to update more recently is surely an oversight (and > I have on my to-do list the task of updating the cumulative TeX list, > getting it processed with hypenex and posted to CTAN), the word > "over-view" has been on the list since 2005,

Re: [tex-hyphen] hyphenating overview

2018-06-09 Thread Arthur Reutenauer
> There may well be.  With British English patterns, "overview" (as the second > or subsequent word of a paragraph) is hyphenated as expected; with American > English patterns, it is not : Confirmed. The word “overview” is hyphenated as expected (over-view) with the British English patterns,

Re: [tex-hyphen] Evolving usage for UK hyphenation patterns

2018-03-31 Thread Arthur Reutenauer
>Well, although the patterns are now "in the cloud", they are accessible >only to Dominik & I, and neither of us will live forever, so it seemed >vital to both of us that they be made accessible to (at least some >members) of the wider (and younger) community. I couldn’t agree

Re: [tex-hyphen] Evolving usage for UK hyphenation patterns

2018-03-31 Thread Arthur Reutenauer
Dear Dominik, >The preface says, "Finally, the > word-division recommendations follow the tried-and-tested Oxford system." > It also says that it was, "prepared in consultation with the Society for > Editors and Proofreaders (SfEP)." Thanks for

Re: [tex-hyphen] hyph-utf8 documentation

2018-03-15 Thread Arthur Reutenauer
> It should be mentioned at the very end of the manual, together with > "naustrian", which is already present there. Oh right, that list. Added, thanks. It’s in the GitHub repository and will be part of the next upload to CTAN. Best, Arthur

Re: [tex-hyphen] Add \{left,right}hyphenmin info to pattern files

2018-03-06 Thread Arthur Reutenauer
> Thanks, but *which* one is the `perfect' showcase template that > contains all of the useful (and perhaps not so useful) entries? For > example, looking into `hyph-de-1996.tex', I don't see information on > \lefthyphenmin at all... Well, of course not, since you didn’t put it in the source

Re: [tex-hyphen] Procedure for adding alternative patterns

2017-09-29 Thread Arthur Reutenauer
Hi, > It would be a very bad idea to upset the author upfront with a request > to discard all the work he did, at the same time changing the exact > layout of all documents typeset in Bulgarian. Agreed. > If you are asking about the best way to get the message from the > author: my

Re: [tex-hyphen] Procedure for adding alternative patterns

2017-09-25 Thread Arthur Reutenauer
Hi Stojan, To stress the central part of Mojca’s longish message, it would really be best if you could get some people to work on the patterns and come up with a file that is definitely better than the current one, so that we can simply replace the current file. Contrary to her, I do

Re: [tex-hyphen] Polish hyphenation patterns and MIT licence

2017-09-25 Thread Arthur Reutenauer
Dear Hanna, Thank you very much, we’re going to make the change. Best, Arthur

Re: [tex-hyphen] US English patterns in hyph-en-us.pat.txt are buggy

2017-06-10 Thread Arthur Reutenauer
> Sure, but I also want to compare different hyphenation rules > and an invented word does the trick. I just wanted to know if > I may accept the new result so that the test file passes again. As Werner hinted, this is not a good way to test patterns that have been generated by patgen; it’s far

Re: [tex-hyphen] US English patterns in hyph-en-us.pat.txt are buggy

2017-06-09 Thread Arthur Reutenauer
Hi Karl Ove, Note that Roozbeh’s report is about the hyphenation patterns for American English, that has different practices than the ones followed by Oxford University Press. > But it also recommends > > dem|oc¦ra|tise > > where | indicates a primary hyphenation point and ¦ a

Re: [tex-hyphen] US English patterns in hyph-en-us.pat.txt are buggy

2017-06-09 Thread Arthur Reutenauer
Hi Roozbeh, > First post to the list, reporting a bug. Please point me to the bug tracker > if there is one. The hyphenation patterns are now hosted on GitHub, and you can open an issue there (https://github.com/hyphenation/tex-hyphen), but I’m happy to reply here: > Debugging an

Re: [tex-hyphen] [tex-live] hyph-ru.tex is faulty.

2017-01-26 Thread Arthur Reutenauer
> Which I knew at some point since Christmas as that was one argument > I used against adding normalisation, that it only avoided the problem > for some subset of languages.. > But missed that just now, sorry:-) That’s funny. > Yes NFD is in a way more consistent I'd agree. Anyway thanks for >

Re: [tex-hyphen] How to update patterns in hyphen-churchslavonic?

2016-10-16 Thread Arthur Reutenauer
Hi Mike, We’ll do it; I’m extremely busy right now and won’t have time before the end of the month, but maybe Mojca will beat me to it. Best, Arthur

Re: [tex-hyphen] Hyphenation patterns for Belarusian

2016-09-16 Thread Arthur Reutenauer
Hi again, > Also, is there any easy way to prohibit hyphenation of consonant-only > endings/beginnings of a word? > I can remember a word with 3 consonants at the end. > Is generation of .ccc8 8ccc. patterns the only way to go? (patterns for 2 > consonants are already in place) Yes,

Re: [tex-hyphen] Hyphenation patterns for Belarusian

2016-09-16 Thread Arthur Reutenauer
Hi Maksim, Sorry for the late answer, this may still be useful: > Do you know a way that doesn't involve system-wide installation of patterns > and generation of a full-blown *TeX document? > I had in mind something like this: > % -- BEGIN -- > \catcode`\{=1 > \catcode`\}=2 > \input

Re: [tex-hyphen] Hyphenation patterns for Belarusian

2016-08-31 Thread Arthur Reutenauer
> Here it is > http://extensions.services.openoffice.org/en/project/dict-be-official Thanks. > The file itself is in cp1251 and needs conversion to UTF-8 > iconv -f cp1251 -t UTF-8 < ./hyph_be_BY.dic > ./hyph_be_BY.txt > + some hand editing to put the content inside \patterns{} Thanks, I

Re: [tex-hyphen] Polyglossia and Latin

2016-08-30 Thread Arthur Reutenauer
Claudio, I’ll reply to the rest of your message later, but on this point: > Arthur, if you are not interested in the documentation you can throw away > the pdf file; when things are done you might even chose to throw away also > the .dtx file. I am of course interested in

Re: [tex-hyphen] Hyphenation patterns for Belarusian

2016-08-30 Thread Arthur Reutenauer
> Thanks a lot. I hope I can do it by myself. My understanding of problems with > those patterns is that the author incorrectly specified groups of letters > (E.g.: made 'ь' a consonant which is incorrect) and this lead to conflicts > since there are special rules for ь ' й ў. It’s a safe

Re: [tex-hyphen] Hyphenation patterns for Belarusian

2016-08-30 Thread Arthur Reutenauer
> I'll review patterns and return when xetex will not complain. Actually, making the patterns acceptable to TeX is easy, I can do that for you. I think it would be more interesting to analyse the logic behind them, and hopefully fix them, because there seems to be something seriously wrong.

Re: [tex-hyphen] Hyphenation patterns for Belarusian

2016-08-29 Thread Arthur Reutenauer
> But unfortunately the test script doesn't work for me. > I tried it with TeXLive 2014.20141024-2 without success (unicode-letters.def > is not shipped with it) Try unicode-letters.tex with TeX Live 2014. Arthur

Re: [tex-hyphen] Hyphenation patterns for Belarusian

2016-08-28 Thread Arthur Reutenauer
Hi Maksim, First of all thank you for your efforts, although I would say you’re trying to do a little too much at this stage, I’ll explain why at the end. > ! Conflicting pattern ignored. > l.6024 } > > ? > ! Emergency stop. > l.6024 } > > ! ==> Fatal error occurred, no

Re: [tex-hyphen] Why does "\-" not work?

2016-08-22 Thread Arthur Reutenauer
> If before the word to be hyphenated I have the two subsequent commands: > > \paragraph{ddd} > ddd > \flushleft > > then that blocks any sort of hypenating. Commenting out any of these two > commands solves the problem. It’s likely that your problem is simply a consequence of TeX not

[tex-hyphen] Pattern repository moved to GitHub

2016-05-29 Thread Arthur Reutenauer
Hello all, As hinted in an email earlier this week, the repository hosting hyphenation patterns for TeX has been moved to GitHub at https://github.com/hyphenation/tex-hyphen -- the migration was completed three weeks ago but hadn't yet been publicly announced. This is in response to a

Re: [tex-hyphen] hyphenation/hyph-utf8, libhyphen, lefthyphenmin and righthyphenmin

2016-05-27 Thread Arthur Reutenauer
Sorry, just remembered: >> In other words, I need to present to libhyphen an Irish pattern file >> starting with: >> >> UTF-8 >> LEFTHYPHENMIN 2 >> RIGHTHYPHENMIN 3 You’ll need to run substrings.pl on the pattern file coming from TeX, of course. Best, Arthur

Re: [tex-hyphen] hyphenation/hyph-utf8, libhyphen, lefthyphenmin and righthyphenmin

2016-05-27 Thread Arthur Reutenauer
Hi Eric, > - the patterns are built for certain values of lefthyphenmin and > righthyphenmin, and in particular do not function properly with smaller > values than they have been built for. E.g. the pattern "1b2l", with > leftminhyphen = 1, would hyphenate the word xbla as x=bla,

Re: [tex-hyphen] String preparation

2016-05-27 Thread Arthur Reutenauer
> Sure! I don't think these will be supported by polyglossia or babel > natively, they'll certainly be used only by hand for > "micro-hyphenation". I'll advertise that for Latin though, providing a > snippet. (most people using liturgical Latin use LuaLaTeX because > Gregorio is built on top of

Re: [tex-hyphen] String preparation

2016-05-27 Thread Arthur Reutenauer
> \hjcode 769=0 % combining acute counts for 0 I just realised that you expected 0 to mean “0 characters in length”, by analogy with the examples Hans gave with 1 and 2, but for \lccode 0 has the effect to deactivate hyphenation. (Obviously I don’t know which it is for \hjcode, otherwise I

Re: [tex-hyphen] String preparation

2016-05-27 Thread Arthur Reutenauer
> \fr > > \hsize 1mm > > \hjcode`x=`o > > foobar % foo-bar > > fxxbar % fxx-bar > > \lefthyphenmin3 > > œdipus % œdi-pus > > \lefthyphenmin4 > > œdipus % œdipus > > \hjcode`œ=2% < 32 then it's the length and code is char code > > œdipus % œdi-pus That looks really nice :-) >>

Re: [tex-hyphen] String preparation

2016-05-26 Thread Arthur Reutenauer
>\hyphenationmin > (the minimal total number of letter for a word to be hyphenated, a new > primitive) I’m aware of that, and I really can’t understand why it wasn’t called \totalhyphenmin (or something similar with a prefix) ...

Re: [tex-hyphen] String preparation

2016-05-26 Thread Arthur Reutenauer
> in etex_man.pdf, e.g. `texdoc etex`, section 3.10 Thanks. > the idea behind \savinghyphcodes is that character > equivalence settings for given language expressed > by \lccode mappings are saved and frozen in the > format file. I just read it in the eTeX manual. It may help for some

Re: [tex-hyphen] String preparation

2016-05-26 Thread Arthur Reutenauer
>> \savinghyphcodes > > Interesting, where is it documented? No idea, I think it comes from eTeX ultimately. I never looked into it myself, but it gets mentioned from time to time (most recently on this list by Petr Sojka a few weeks ago). > How do you foresee an implementation for this?

Re: [tex-hyphen] String preparation

2016-05-25 Thread Arthur Reutenauer
> So let's take an example : let's say I want to consider œ as just one > unit in order to simplify the problem. What I want in the general case > is left/righthyphenmin=2, so what I want to get is > > œ́di-pus > œdi-pus > > and not > > œ́-di-pus > > How can I achieve that? These

Re: [tex-hyphen] String preparation

2016-05-25 Thread Arthur Reutenauer
> I think the main point is that it is not treated the same way as œ. For > instance with right/lefthyphenmin = 2, we have > > œ́-di-pus > œdi-pus If that’s what the patterns do, they should be fixed, and \left and \righthyphenmin set to something more useful. That’s what we have for

Re: [tex-hyphen] String preparation

2016-05-25 Thread Arthur Reutenauer
> Except the huge amount of time one would have to spend on that... We > have 26000 hyphenated words, but that's really not much, especially for > Latin, and I don't think it's enough to use patgen... What do you mean? Arthur

Re: [tex-hyphen] String preparation

2016-05-25 Thread Arthur Reutenauer
> I agree that "œ+combining acute" should be treated the same as œ, but > should it count as 1 or two characters (o+e) for the > right/left/hyphenmin? I'm not sure about that, but maybe some would > treat it as two characters It’s one grapheme; if you want to treat as two characters, for

Re: [tex-hyphen] New release of hyph-utf8

2016-05-16 Thread Arthur Reutenauer
Hi Claudio, > Aren't also the three Latin pattern files MIT licenced? You're the author, so you tell us ;-) We noted that you agreed to the licence change and we're processing it; since there were many of them a few may have got lost, but we'll rectify that omission soon. The MIT

Re: [tex-hyphen] To which collection should the Church Slavonic patterns go?

2016-05-14 Thread Arthur Reutenauer
Hi Mojca, > Which collection should include the Church Slavonic hyphenation patterns? > > We have the following existing collections (not listing those that > wouldn't fit at all): > > collection-langcyrillic > collection-langeuropean > collection-langother > > I would be inclined to

Re: [tex-hyphen] updating the license of language patterns

2016-04-19 Thread Arthur Reutenauer
> Thanks for the quick reply. Keeping LPPL is fine, I was about to ask for it > but thought that you would know better. We think the best way in the current situation is to clearly document the licences under which a file has been made available. It's unlikely a third-party developer would

Re: [tex-hyphen] updating the license of language patterns

2016-04-19 Thread Arthur Reutenauer
Dear Georgi, Many thanks for your assent. I've just updated the file, see http://tug.org/svn/texhyphen/trunk/hyph-utf8/tex/generic/hyph-utf8/patterns/tex/hyph-bg.tex?revision=748=markup Note I've marked the file as available under either the LPPL or the MIT licence because

Re: [tex-hyphen] About to upload hyphen-churchslavonic.zip to CTAN - review for correctness, please

2016-04-17 Thread Arthur Reutenauer
Hi Mike, > What should I do to make the patterns included into hyph-utf8? Mojca and I will install them. I'll have a look at the content of the package later to check that everything is all right. Best, Arthur

Re: [tex-hyphen] Unicode code points or UTF-8 codes?

2016-04-13 Thread Arthur Reutenauer
> The less weird-looking way of setting the no break code by means of 0301 > is good for xelatex but it does not work with lualatex. That's strange, but again we're not going to use that. > I thought this would be an interesting piece of information for you for your > team when it's time

Re: [tex-hyphen] Renaming plain pattern files in hyph-utf8: feedback?

2016-04-12 Thread Arthur Reutenauer
> I solved it this afternoon, without using the lua scripting language; my > solution is a horrible patch, but is totally hidden from the final user. > What I complain is the week or so that I spent on this problem without > finding any useful information on the documentation. I'm sorry about

Re: [tex-hyphen] Unicode code points or UTF-8 codes?

2016-04-12 Thread Arthur Reutenauer
> sequence <00E6 LATIN SMALL LETTER AE, 0301 COMBINING ACUTE ACCENT> (ǽ) Sorry, I meant <0153 LATIN SMALL LIGATURE OE, 0301 COMBINING ACUTE ACCENT> (œ́). Arthur

Re: [tex-hyphen] Unicode code points or UTF-8 codes?

2016-04-12 Thread Arthur Reutenauer
> For example suppose you want to set into a pat.txt file the combining acute > accent: is had code point U+0301; speaking TeX or LuaTeX, must that glyph be > inserted as 0301 or the utf-8 way cc81 ? As Mojca mentioned, you should input them straight in UTF-8 anyway, but for your

Re: [tex-hyphen] Unicode code points or UTF-8 codes?

2016-04-12 Thread Arthur Reutenauer
> Well, you'll probably end up with one weird-looking pattern "8́" > (looking like "eight with acute" and in fact saying "do not hyphenate > before the combining acute accent"), but such is life ... Exactly. For Latin-script languages there shouldn’t be too many of those anyway. In fact, I

Re: [tex-hyphen] Renaming plain pattern files in hyph-utf8: feedback?

2016-04-12 Thread Arthur Reutenauer
> - Every language would contain an additional folder with the name of > that language (patterns for English, Greek, German, Latin, Mongolian, > Serbian would end up in the same folder; > > Why? This may be a confusing use of the comma :-) Mojca meant that for all the languages

Re: [tex-hyphen] Renaming plain pattern files in hyph-utf8: feedback?

2016-04-11 Thread Arthur Reutenauer
On Tue, Apr 12, 2016 at 12:15:44AM +0200, Claudio Beccari wrote: > I am keen to any naming policy, but I am not keen to delete any pattern file We're not going to delete anything. Arthur

Re: [tex-hyphen] Names of files in OFFO

2016-04-11 Thread Arthur Reutenauer
> I thought about this and I think the best solution is to add an extension to > FOP to handle the cases that cannot nicely fit in language plus country > pattern. I do not know enough about FOP to comment, but that sounds like a good idea. In any case, if you could get rid of nonsensical tags

Re: [tex-hyphen] [Trennmuster] Renaming plain pattern files in hyph-utf8: feedback?

2016-04-10 Thread Arthur Reutenauer
> The subdirs (el, en,...) are certainly not necessary because there are > not so many files and especially because directory listings are sorted > alphabetically already. It's probably more convenient if these > directories are omitted. There are 73 different languages and variants, and 4

Re: [tex-hyphen] js hyphen patterns?

2016-03-20 Thread Arthur Reutenauer
Hi Élie, The best person to ask is probably Mathias Nater, the author of Hyphenator; he doesn't seem to be subscribed to this list. On Sun, Mar 20, 2016 at 04:53:28PM +0100, Élie Roux wrote: >Also, maybe it > would be best to

Re: [tex-hyphen] Names of files in OFFO

2016-03-20 Thread Arthur Reutenauer
On Thu, Mar 17, 2016 at 05:48:26PM +0100, Ulrike Fischer wrote: > I think (but I don't know anything about greek) that it wasn't only > a question of hyphenation but also other settings, like > accents/words etc, which are different in the versions. Makes sense. Ancient Greek and Modern

Re: [tex-hyphen] Names of files in OFFO

2016-03-19 Thread Arthur Reutenauer
On Thu, Mar 17, 2016 at 02:20:02PM -0400, Barbara Beeton wrote: > okay. then there *are* two entries for > every possibility Yes, indeed. I never said anything to the contrary. There can even be more than two entries if there are several characters with oxia-tonos in the pattern (rare). We

Re: [tex-hyphen] Names of files in OFFO

2016-03-19 Thread Arthur Reutenauer
On Thu, Mar 17, 2016 at 01:55:27PM -0400, Barbara Beeton wrote: > that's all very well, and i understand > how *unicode* works. what i'd really > like to see is how this equivalence > is determined in a (la)tex source file. In the case of Greek hyphenation, by making as many copies of the

Re: [tex-hyphen] Names of files in OFFO

2016-03-19 Thread Arthur Reutenauer
On Wed, Mar 16, 2016 at 09:07:50PM +0100, Ulrike Fischer wrote: > A bit of-topic: On tex.sx there was some time ago a question about > how to combine modern and classic greek in one document with babel > and I saw that there is no simple way to do it. Wouldn't Ancient Greek hyphenation work

Re: [tex-hyphen] Names of files in OFFO

2016-03-19 Thread Arthur Reutenauer
On Thu, Mar 17, 2016 at 10:58:22PM +, Philip Taylor wrote: > That "ancient Greek is not a true superset [of modern Greek]; modern > Greek contains where ancient Greek would contain > , or character>" > because "tonos and oxia /look/ the same; search a file for >

Re: [tex-hyphen] Names of files in OFFO

2016-03-19 Thread Arthur Reutenauer
On Thu, Mar 17, 2016 at 01:35:04PM -0400, Barbara Beeton wrote: > where and how is canonical equivalence > defined? (pointer to reference?) Section 2.12 of the Unicode Standard for the general presentation and 3.7 for the formal definition (definition D70). Best,

Re: [tex-hyphen] Names of files in OFFO

2016-03-19 Thread Arthur Reutenauer
On Thu, Mar 17, 2016 at 03:44:50PM +, Philip Taylor wrote: > No, ancient Greek is not a true superset; modern Greek contains > where ancient Greek would contain , > or character>. Tonos and oxia are the same character. Arthur

Re: [tex-hyphen] Names of files in OFFO

2016-03-19 Thread Arthur Reutenauer
On Thu, Mar 17, 2016 at 05:26:35PM +, Philip Taylor wrote: > Tonos and oxia /look/ the same; search a file for character+oxia when > the file has been created with character+tonos and the former will not > be found. They will if your editor takes canonical equivalence into account.

Re: [tex-hyphen] Names of files in OFFO

2016-03-18 Thread Arthur Reutenauer
On Thu, Mar 17, 2016 at 05:32:55PM +, Philip Taylor wrote: > Which many (perhaps most) search utilities and editors do not. If you say so. What's your point? Arthur

Re: [tex-hyphen] Names of files in OFFO

2016-03-18 Thread Arthur Reutenauer
On Thu, Mar 17, 2016 at 06:07:10PM +0100, Ulrike Fischer wrote: > http://tex.stackexchange.com/questions/294828/can-one-combine-ancient-and-modern-greek-with-babel Thanks. I did suspect that it would somehow be even more relevant to me than it sounded at first ;-) > Btw: Did you upload the

Re: [tex-hyphen] Names of files in OFFO

2016-03-18 Thread Arthur Reutenauer
On Thu, Mar 17, 2016 at 11:15:46PM +, Philip Taylor wrote: > Arthur Reutenauer wrote: >> Are you sure that is your point? > > Yes. That's not a very good point, then. Arthur

  1   2   >