Re: [tex-hyphen] Hyphenation in Albanian
Dear Arthur, dear Mojca Attached you find a zip file named AlbanianHyphenation.zip. This is the result of my efforts with the substantial help of MoA Sabina Koliqi, original Albanian graduate in Albanian Literature, then Italian professor graduated in Education Teaching. I do not know the Albanian language, but this language is dr Koliqi's mother language and is implied by her university studies; I know how to build hyphen patterns; we joined our competences and the above .zip file contains our results, in particular the hyph-sq.tex file contains the UTF-8 encoded patterns, with a preamble modeled on the other pattern files distributed with TeX Live. We looked for an hyphenated Albanian word list, but we could not find any. Dr Koliqi, extracted a word list from a couple of chapters of an Albanian book; she tried to create an Albainan hyphenated word list. Then I entered the challenge, but I was unsuccessful with the patgen program that is distributed with the TeX System; documentation is very scarce and refers to the Omega program. As a result we abandoned the patgen solution and we moved to another approach that I find very effective, even if it requires a lot of "elbow grease". The approach is based on LuaLaTeX and its ability to load on the fly a pattern file and to hyphenate a list of words given as simple text. This is provided by package testhyphens.sty and its checkhyphens environment. As you see form the zipped file, the source abanian-test-lualatex-2.tex loads also the multicol.sty package, in order to typeset the result in four column mode; of course the setting for four columns can be changed to 1 (one) column and the result may be used as a dictionary if patgen is to be used to find another (different) pattern-set created without any use of elbow grease. My preceding experience with other languages taught me that this elbow grease spent by a sufficiently well educated person produces better results than patgen. Of course this statement is not valid for certain languages, English in first place, because patterns are based on spelling and not on pronunciation; for English in both main incarnations, British and US, there are errors that can't be corrected because there are homographs that are pronounced differently if they refer to nouns or to verbs: for example "the record" and "I record"; "the analyses" and "he analyses". Therefore we started with a basic list of a dozen patterns (the single letter patterns with implied 0 values on both sides were omitted, and only the Albanian digraphs were considered). After each run of the LuaLaTeX compilation dr Koliqi would correct on the printed list the wrong hyphenation points; I would modify the pattern list; and we would iterate until all words were correctly hyphenated. Non very professional, you might think, but very effective. The Albanian hyphenation is peculiar; Albanians say they have an alphabet made up with more than 30 letters; while interacting with dr Koliqi I found out that in Albanian they miss a word for "letter" as it is implied by any computer encoding, from ASCII to UTF-8, therefore "sh", "dh", "zh", and similar digraphs are called with the same name as "a", "b", "c", and so on. Eventually we could find a common mutual understanding, and we could proceed pretty rapidly. We worked on an initial set of a little more than 2600 words; then we reduced the set to the actual one contained in the LuaLaTeX source file. Differently from patgen, the pattern set we built up does not minimize the probabilities of hyphenation errors; the number of wrong hyphenated words is zero. Notice: the LuaTeX source file sets both the left and right hyphenmin values to 1; in practice the hyphenation language description file should set both to the value 2. I always build the hyphen sets with the value 1, because I imagine that in some rare cases of narrow column typesetting, the correct justification may be achieved with this not too professional typographical setting. But the word set we worked on is limited; and it is possible that while actually using this pattern set by the Albanian users with their actual documents, some more patterns, or a list of hyphenation exceptions might become necessary. I might be available to modify such patterns for a short while; at my age I am not going to live for ever; therefore the Albanian TeX community should take over. All the best Claudio On 16/06/2020 15:22, Arthur Reutenauer wrote: Dear Claudio, On Mon, Jun 15, 2020 at 11:57:33PM +0200, Claudio Beccari wrote: I can certainly ask the student to allow distributing her thesis, but I believe it will not be of great utility, because, as I said, the thesis is in Italian, with very few stretches in Albanian, where the needed rare hyphen points were set by hand. I think the list of hyphenated words would be very useful, so if she’s ready to publish that, it would be really great.
Re: [tex-hyphen] Hyphenation in Albanian
Dear Claudio, On Mon, Jun 15, 2020 at 11:57:33PM +0200, Claudio Beccari wrote: > I can certainly ask the student to allow distributing her thesis, but I > believe it will not be of great utility, because, as I said, the thesis is > in Italian, with very few stretches in Albanian, where the needed rare > hyphen points were set by hand. I think the list of hyphenated words would be very useful, so if she’s ready to publish that, it would be really great. Best, Arthur
Re: [tex-hyphen] Hyphenation in Albanian
Joan, I just created https://github.com/hyphenation/albanian as an empty repository and will add you as a contributor. Is your GitHub user name iGianni? Arthur
Re: [tex-hyphen] Hyphenation in Albanian
Dear Arthur, I can certainly ask the student to allow distributing her thesis, but I believe it will not be of great utility, because, as I said, the thesis is in Italian, with very few stretches in Albanian, where the needed rare hyphen points were set by hand. All the best Claudio On 15/06/2020 21:40, Arthur Reutenauer wrote: Hi Claudio, On Sun, Jun 14, 2020 at 12:05:19AM +0200, Claudio Beccari wrote: Recently I assisted an Albanian student getting her degree in Italy, who wrote her thesi in Italian, bu with many stretches of text in Albanian; these parts where hyphenated by hand, because she could not use LaTeX, but the final printing was done from a LaTeX generated pdf file; the supervisors were very happy to see a well typeset thesis, that in humanities apparently is pretty uncommon. If that thesis is available somewhere, it would be very useful to be able to look at it :-) Best, Arthur
Re: [tex-hyphen] Hyphenation in Albanian
I would like to thank Claudio, Mojca and Arthur for their replies. I apologies but I had been not subscribed at the mailing list so I did not receive Claudio's email yesterday. Now everything is ok and for sure I will continue to work in this issue for the coming days. My to-do list for coming days will be: 1. Find a detailed grammatically theory of hyphenation in Albania. Since I am not a linguist, I have to ask help from a friend of mine, who is an expert in this filed. 2. Translate the rules in english and put the document at public domain using github. 3. Read the documentation wich Claudio, Mojca and Arthur recomend. 4. Create some patterns and test if are working correctly. I hope that this would be the begging of adding something that later My personal email is igi...@hotmail.com. I would like to thank you all again for your warm welcoming. Kind regards. Joan Jani On 15/6/20 11:05 π.μ., Mojca Miklavec wrote: Hi, Off-list. Claudio Beccari already wrote a good answer. We don't really have a team actively working on creating new patterns for new languages, but there are a bunch of experts (Claudio being among them). We are mostly collecting existing patterns and making sure that they stay in consistent shape. So by far the best way to get the patterns working would be to try to create them yourself, or find someone to help you. This may include people on the list, but you need to provide some faithful sources, grammar rules, dictionaries etc. There are two orthogonal ways to achieve the goal: - assemble a list of hyphenated words from a dictionary and run patgen (or one of its rewrites, we can help you with that) - come up with a set of clear rules for hyphenation (like: always hyphenate after letter 'a', never hyphenate between these letter pairs, ...) and write hyphenation patterns manually I would suggest you to read https://tug.org/docs/liang/liang-thesis.pdf Mojca PS: Please don't expect an answer to a private mail, I've been struggling recently to find time to answer emails. But you can continue the discussion on the list, you just need to provide more information, try to understand how hyphenation patterns work (read the above or maybe find some BachoTeX talks from Arthur Reutenauer). On Sat, 13 Jun 2020 at 19:22, Joan Jani wrote: Hello to all, I am a latex user for more than 15 years (I wrote my first lab report in latex back in 2004). Since there is no hyphenation patterns for Albanian i get always the message bellow: -- No hyphenation patterns were preloaded for (babel) the language 'Albanian' into the format. I want to participate in your group helping to create this hyphenation pattern. Kind regards, Joan Jani
Re: [tex-hyphen] Hyphenation in Albanian
I would like to thank Claudio, Mojca and Arthur for their replies. I apologies but I had been not subscribed at the mailing list so I did not receive Claudio's email yesterday. Now everything is ok and for sure I will continue to work in this issue for the coming days. My to-do list for coming days will be: 1. Find a detailed grammatically theory of hyphenation in Albania. Since I am not a linguist, I have to ask help from a friend of mine, who is an expert in this filed. 2. Translate the rules in english and put the document at public domain using github. 3. Read the documentation wich Claudio, Mojca and Arthur recomend. 4. Create some patterns and test if are working correctly. I hope that this would be the begging of adding something that later My personal email is igi...@hotmail.com. I would like to thank you all again for your warm welcoming. Kind regards. Joan Jani On 15/6/20 11:05 π.μ., Mojca Miklavec wrote: Hi, Off-list. Claudio Beccari already wrote a good answer. We don't really have a team actively working on creating new patterns for new languages, but there are a bunch of experts (Claudio being among them). We are mostly collecting existing patterns and making sure that they stay in consistent shape. So by far the best way to get the patterns working would be to try to create them yourself, or find someone to help you. This may include people on the list, but you need to provide some faithful sources, grammar rules, dictionaries etc. There are two orthogonal ways to achieve the goal: - assemble a list of hyphenated words from a dictionary and run patgen (or one of its rewrites, we can help you with that) - come up with a set of clear rules for hyphenation (like: always hyphenate after letter 'a', never hyphenate between these letter pairs, ...) and write hyphenation patterns manually I would suggest you to read https://tug.org/docs/liang/liang-thesis.pdf Mojca PS: Please don't expect an answer to a private mail, I've been struggling recently to find time to answer emails. But you can continue the discussion on the list, you just need to provide more information, try to understand how hyphenation patterns work (read the above or maybe find some BachoTeX talks from Arthur Reutenauer). On Sat, 13 Jun 2020 at 19:22, Joan Jani wrote: Hello to all, I am a latex user for more than 15 years (I wrote my first lab report in latex back in 2004). Since there is no hyphenation patterns for Albanian i get always the message bellow: -- No hyphenation patterns were preloaded for (babel) the language 'Albanian' into the format. I want to participate in your group helping to create this hyphenation pattern. Kind regards, Joan Jani
Re: [tex-hyphen] Hyphenation in Albanian
Hi Claudio, On Sun, Jun 14, 2020 at 12:05:19AM +0200, Claudio Beccari wrote: > Recently I assisted an Albanian student getting her degree in Italy, who > wrote her thesi in Italian, bu with many stretches of text in Albanian; > these parts where hyphenated by hand, because she could not use LaTeX, but > the final printing was done from a LaTeX generated pdf file; the supervisors > were very happy to see a well typeset thesis, that in humanities apparently > is pretty uncommon. If that thesis is available somewhere, it would be very useful to be able to look at it :-) Best, Arthur
Re: [tex-hyphen] Hyphenation in Albanian
El 14/06/2020 a las 0:05, Claudio Beccari escribió: Apparently the language handler polyglossia has a module for Albanian; on the opposite the Babel documentation does not list your language among the supported ones; but here exists the file albanian.ldf; this file esplicitly for what concern hyphenation falls back to the Englis patterns.. Actually, both polyglossia and babel attempt to use the albanian hyphenation, but since it doesn't exist, both polyglossia and babel fall back to english. Note patterns aren't specific to either package. Once the patterns have been created and the configuration files updated, both packages should work. Javier
Re: [tex-hyphen] Hyphenation in Albanian
Apparently the language handler polyglossia has a module for Albanian; on the opposite the Babel documentation does not list your language among the supported ones; but here exists the file albanian.ldf; this file esplicitly for what concern hyphenation falls back to the Englis patterns.. The support offered by polyglossia is specified in gloss-albanian.ldf This file sets \hyphennames={albanian}, but I assume that actually albanian patterns are missing also for polyglossia. Very good. For the very ittle I know of Albanian, I'd say you are in a very good position to create yourself the suitable patterns. Your language, as far as I can say, is is pretty much phonetic and probably the grammar rules reflect this aspect. Start from the grammar rules of syllabification defferent from hyphenation — hyphenation must obey grammar, but also typography; for example the minimum number of characters for the first and the last hyphen points. An example in Italian: the word "idea" can be syllabified according to grammar in "i-de-a", but in typography it cannot be hyphenated at all because in Italian typography you neve leave a word fragment at the end or at the start of a line made up with just one letter. Remember you language has many sounds, each one generally rendered with a single letter possibly with diacritics; very few dygraphs, such as sh, dh, and similar ones; patterns mus include at least one pattern ofr every letter, moreover you have to take care of indivisible consonat clusters, or indivisible vocal clusters. I suggest you to work with a grammar at hand; while you create or modify your patterns write down a file with words containing those patterns. While you proceed you might need to check the correctness of your patterns. At the moment it would be confusing, but in due time write me and I will show you, if still possible, the set up I used to test the patterns I wrote for several languages, mostly of Latin origin, without the complex set up that is needed by the team. Remember to code you pattern file in UTF-8 encoding. It is much easier for you to read them and for the subsequent processing needed by the team. Recently I assisted an Albanian student getting her degree in Italy, who wrote her thesi in Italian, bu with many stretches of text in Albanian; these parts where hyphenated by hand, because she could not use LaTeX, but the final printing was done from a LaTeX generated pdf file; the supervisors were very happy to see a well typeset thesis, that in humanities apparently is pretty uncommon. All the best Claudio On 13/06/2020 19:04, Joan Jani wrote: Hello to all, I am a latex user for more than 15 years (I wrote my first lab report in latex back in 2004). Since there is no hyphenation patterns for Albanian i get always the message bellow: -- No hyphenation patterns were preloaded for (babel) the language 'Albanian' into the format. I want to participate in your group helping to create this hyphenation pattern. Kind regards, Joan Jani