Re: [sword-devel] usfm2osis.py and tag \cp
> Von: Chris Little > I hope I've fixed this now. (I haven't tested that it functions > correctly, but the error was fairly obvious from the traceback below.) Hi Chris, Sorry, while the crash has gone, the function is not correct - at all. \cp is meant to give a printed chapter number which has no influence on the underlying counting of verses and chapters. How exactly to represent it in OSIS, we would need to figure out, but it should not influence the creation of subsequent osisIDs. I would think is probably the best for our purposes. The OSIS reference is not exactly helpful at this point, nor does it reflect the reality of module making. Right now the code does two things: It replaces in the sample below the chapter number 1 with an A for the subsequent verse's osisID ("Esth.A.1" instead of "Esth.1.1") and it leaves the \cp A in place. This is both not right - both acc OSIS reference and acc the desires of the USFM writer in my example. > > Following minimal USFM code creates below attached error message. > > > > \id EST > > \h ESTER > > \c 1 > > \cp A > > \s En Mordekai eh Ouraman > > \p > > \v 1 Mordekai, > > Here is the currently generated OSIS: http://www.bibletechnologies.net/2003/OSIS/namespace"; xmlns:xsi="h ESTER \cp A En Mordekai eh Ouraman Mordekai, I think the best way of expressing above usfm should be something along following lines: http://www.bibletechnologies.net/2003/OSIS/namespace"; xmlns:xsi="h ESTER A En Mordekai eh Ouraman Mordekai, -- The alternative would be to use one of the messy constructions shown in Appendix I of the OSIS reference for two reference systems, but this is not only very ugly, but will fail to elicit any support in the engine, nor likely gain such support within the near, mid-term or long-term future. Yours Peter ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] multiple languages in modules
G'day Karl, On Fri, Oct 12, 2012 at 2:11 PM, Karl Kleinpaste wrote: > > > Is the element passed through the engine? If so, do I need > > to file bugs with front-ends to encourage support of ? > > Having just looked, the string "foreign" does not appear in Sword's > source tree in src/modules/filters/*.cpp. So it's not supported right > now after all. I don't know how BPBible supports it; I had understood > that BPBible uses the regular filter sets. Does BPBible actually > subclass the filters and extend them for ? > BPBible doesn't support foreign. It only looks like it does. What BPBible does support is automatically detecting Greek and Hebrew text and marking it to be used with the configured Greek/Hebrew fonts. Just for the record, BPBible does subclass the regular filters quite substantially. It uses it for things like: poetic text display strongs headwords instead of numbers (if option is on) quote colouring by speaker in ESV (if option is on) cross-reference expansion (if option is on) as well as some HTML+class code so CSS can be applied Probably some of the new XHTML filter will overlap with what BPBible is doing with some of the basic html + classes it is writing out. God bless, Ben. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] multiple languages in modules
is the xml way of indicating a language other than the language of the document. So you surround Hebrew text with xml:lang="heb">. Judging from Ben's more recent email, even BPBible does not support it. Regardless of the menthod, the effect is great. I use Linux Libertine all the time for all but Hebrew. Vowel points do not display correctly. Free Serif is a passable alternative because it gets the vowels right, but the Hebrew glyphs seem anemic to me. I am also working with David Troidl on BDB, which has many more languages, including Arabic, Ethiopic, Syriac, and transliterated Akkadian. It is not realistic to expect any one font to handle all of those in addition to Greek and Hebrew. My main point is that applying fonts based on language of the text rather than language of the module is something worth working in. More below. On 10/11/2012 11:11 PM, Karl Kleinpaste wrote: I know nothing of , but can only suppose that, if supported, it must pass through the engine with an appropriate (HTML) indication. As a general rule, I suggest either Free Serif or Linux Libertine, with a slight preference for Free Serif. Both have good coverage across every Latin alphabet variant, and pretty display of both Hebrew and Greek. In modules of mine that have Latin, Greek, and Hebrew alphabets, they all show quite well. We include both of these fonts in Xiphos' Win32 installers. You might find the UDHR module useful, from Crosswire Experimental, as a font demonstration module. (Linux Libertine is not Linux-specific. It was just developed in an open source environment.) Is the element passed through the engine? If so, do I need to file bugs with front-ends to encourage support of ? Having just looked, the string "foreign" does not appear in Sword's source tree in src/modules/filters/*.cpp. So it's not supported right now after all. I don't know how BPBible supports it; I had understood that BPBible uses the regular filter sets. Does BPBible actually subclass the filters and extend them for ? Second, when RtoL text is mixed with LtoR text you can get some strange display problems. Punctuation and numbers can work for both types of languages. This is often an artifact of how toolkits handle LtoR. Today, Xiphos uses GTK and WebKit, but I don't know how these reflect your example case. Our former use of gtkhtml3 -vs- gtkmozembed -vs- xulrunner -vs- today's WebKit always led to some strange realizations for how LtoR would show up in Xiphos. gtkhtml3 wants to right-justify any text containing (or perhaps it was "that leads off with") Hebrew. That peculiarity led to certain unexpected choices for how I created StrongsRealHebrew. I love unicode, but mixed language language directions is one problem that did not exist with legacy fonts. As far as I can tell, all web browsers and word processors do the same thing—when you have some Hebrew text they assume that anything that follows such as numerals or punctuation (until you get some Latin text, for example) is Hebrew. When marking up xml you get a false sense of security about text rendering because the tags use Latin characters. But when they are rendered by a browser, even text outside the is assumed to be Hebrew until you get some Latin text. I think that is why html has , which helps solve the problem. Daniel ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] multiple languages in modules
On 10/12/2012 5:44 AM, Daniel Owens wrote: is the xml way of indicating a language other than the language of the document. So you surround Hebrew text with . A small sidenote, since you do encoding: "heb" is not a legal value for xml:lang. This must be "he" or "hbo" if you mean Ancient Hebrew. 2-letter language subtags from 639-2 are always required if they exist (rather than 3-letter subtags from subsequent 639s). The details are spelled out in BCP 47. You can also find the full current set of IANA registered language subtags at: http://www.iana.org/assignments/language-subtag-registry --Chris ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] usfm2osis.py and tag \cp
On 10/12/2012 4:00 AM, Peter von Kaehne wrote: Sorry, while the crash has gone, the function is not correct - at all. \cp is meant to give a printed chapter number which has no influence on the underlying counting of verses and chapters. How exactly to represent it in OSIS, we would need to figure out, but it should not influence the creation of subsequent osisIDs. I would think is probably the best for our purposes. The OSIS reference is not exactly helpful at this point, nor does it reflect the reality of module making. \cp (like \vp) is a workaround for a limitation in Paratext. Paratext requires that all chapter and verse numbers be numeric and strictly increasing. No lettered or out-of-order or repeated verse or chapter numbers are permissible. However, actual Bibles sometimes include these things. So Paratext requires that you enumerate the chapters/verses with strictly increasing numerals. \cp and \vp let Paratext substitute the correct underlying number when rendering. The description of \cp in the USFM docs states: "This is a chapter marker (number, letter) which would be displayed in the published text (where the published marker is different than the \c # used within the translation editing environment)." The words "translation editing environment" are a reference to Paratext specifically, and the description as a whole conveys that \cp is the real chapter number if a different \c value is necessitated by Paratext. OSIS doesn't have this limitation. You can encode the real verse and chapter numbers in OSIS, without need for a workaround. So usfm2osis.py's replacement of the numeric dummy-chapter with the chapter number specified in \cp is correct. If you look at your USFM document, I anticipate you see something like: \c 1 \cp A ... \c 2 \cp 1 ... \c 3 \cp 2 ... \c 4 \cp 3 ... \c 5 \cp B ... \c 6 \cp 3 ... \c 7 \cp 4 The strictly increasing \c values are just dummy values for Paratext. The \cp values represent the actual underlying chapter numbers for this reference scheme. There aren't two different chapter 3s in Esther, just one that is briefly interrupted by chapter B, but Paratext can't deal with the underlying reference system, so it requires the \cp workaround. Likewise, chapter 4 (\cp 4) isn't really chapter 7 (\c 7). This is mostly based on my experience encoding USX docs for ABS. If your USFM encoder intends that the value in \c be the chapter value, then \cp should not be used. You should look into \ca or \cl as alternatives. Right now the code does two things: It replaces in the sample below the chapter number 1 with an A for the subsequent verse's osisID ("Esth.A.1" instead of "Esth.1.1") and it leaves the \cp A in place. This is both not right - both acc OSIS reference and acc the desires of the USFM writer in my example. With the update just committed, usfm2osis.py should now correctly remove \cp (and \vp). That was a bug--actually a set of bugs. Again, I regrettably haven't tested this, but the code looks good to me. --Chris ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
[sword-devel] seeking consensus on OSIS lemma best practice
Gary Holmlund and I are working on a problem related to the Westminster Hebrew Morphology (WHM) module. We need a consensus on markup practices for OSIS lemma. I was having a problem getting natural Hebrew lemma to look up an entry and display it in the mag window. Gary discovered that if "H" is prefixed to lemma in WHM, the BibleTime mag window works with Hebrew lemma (as opposed to Strong's numbers). My understanding is that this is not typical OSIS best practice but a SWORD convention. I resisted at first, but now I think there is some wisdom to using this method. We need some way to distinguish between Hebrew and Aramaic words, which can be identical in form but not in meaning. WHM uses @ for Hebrew and % for Aramaic. I suggested to Gary that we compromise and simply change @ to H and % to A, modifying BibleTime to strip A and H and use that to look for the entry in the correct lexicon. The markup would look like this: Hebrew (from Deuteronomy): morph="whmmorph:some_value">תֹּאבֵדוּן֮ Aramaic (from Jeremiah): morph="whmmorph:some_value">יֵאבַ֧דוּ The main problem I see is that other front-ends may not follow the process of looking for G or H and then stripping the character before looking up the entry. Could we come to a consensus on this? Daniel ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] multiple languages in modules
On 10/12/2012 03:23 PM, Chris Little wrote: On 10/12/2012 5:44 AM, Daniel Owens wrote: is the xml way of indicating a language other than the language of the document. So you surround Hebrew text with . A small sidenote, since you do encoding: "heb" is not a legal value for xml:lang. This must be "he" or "hbo" if you mean Ancient Hebrew. 2-letter language subtags from 639-2 are always required if they exist (rather than 3-letter subtags from subsequent 639s). The details are spelled out in BCP 47. You can also find the full current set of IANA registered language subtags at: http://www.iana.org/assignments/language-subtag-registry --Chris Okay, thanks. "heb" is more intuitive, so perhaps that is how it crept in. Daniel ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] seeking consensus on OSIS lemma best practice
On 10/12/2012 1:40 PM, Daniel Owens wrote: Gary Holmlund and I are working on a problem related to the Westminster Hebrew Morphology (WHM) module. We need a consensus on markup practices for OSIS lemma. I was having a problem getting natural Hebrew lemma to look up an entry and display it in the mag window. Gary discovered that if "H" is prefixed to lemma in WHM, the BibleTime mag window works with Hebrew lemma (as opposed to Strong's numbers). My understanding is that this is not typical OSIS best practice but a SWORD convention. I resisted at first, but now I think there is some wisdom to using this method. We need some way to distinguish between Hebrew and Aramaic words, which can be identical in form but not in meaning. WHM uses @ for Hebrew and % for Aramaic. I suggested to Gary that we compromise and simply change @ to H and % to A, modifying BibleTime to strip A and H and use that to look for the entry in the correct lexicon. The markup would look like this: Hebrew (from Deuteronomy): תֹּאבֵדוּן֮ Aramaic (from Jeremiah): יֵאבַ֧דוּ The main problem I see is that other front-ends may not follow the process of looking for G or H and then stripping the character before looking up the entry. Could we come to a consensus on this? Could you confirm that this is the behavior in some front end other than BibleTime? From my perspective it just sounds like a BibleTime bug. This is certainly bad OSIS encoding. It is also not a Sword convention. If anything is implemented that requires a language prefix like this, it represents a bug, whether in Sword or in BibleTime. --Chris ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Sword -r2741
On 12/10/12 16:43, luke wrote: In recent correspondence with Karl Kleinpaste of the Xiphos project about display issues with our project's module. He recommended that I try sword's latest -r2741 because it has recent changes regarding osis headings. I do not have access to this version of sword. Would someone be willing to run our project's osis file through the latest version of sword (apparently -r2741), create a module from it and then send me the results? - My OSIS was builting using the sword script from USFM files. - My OSIS validates - I have already ran the fix for titles on my osis. Please contact me if you are willing, Thanks Hi Luke, Can't see any reply to your message here. As far as I can see, osis2mod.cpp hasn't changed since March and the latest revision is 2693. (Someone please correct me if I'm wrong.) I suspect the abovementioned recent changes might be in the sword library processing of the module, not its creation. Robert. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Sword -r2741
Osis2mod is not affected by any of the recent changes. If you built your module with an earlier, it'd be a good idea to build it with the most recent to test your content. I believe that the utility lookup will give you a view into how verses are stored and rendered. In Him, DM On Oct 12, 2012, at 8:08 PM, Robert Hunt wrote: > On 12/10/12 16:43, luke wrote: >> In recent correspondence with Karl Kleinpaste of the Xiphos project about >> display issues with our project's module. He recommended that I try sword's >> latest -r2741 because it has recent changes regarding osis headings. I do >> not have access to this version of sword. >> >> Would someone be willing to run our project's osis file through the latest >> version of sword (apparently -r2741), create a module from it and then send >> me the results? >> - My OSIS was builting using the sword script from USFM files. >> - My OSIS validates >> - I have already ran the fix for titles on my osis. >> >> Please contact me if you are willing, >> Thanks > Hi Luke, > >Can't see any reply to your message here. As far as I can see, > osis2mod.cpp hasn't changed since March and the latest revision is 2693. > (Someone please correct me if I'm wrong.) > >I suspect the abovementioned recent changes might be in the sword library > processing of the module, not its creation. > > Robert. > > ___ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Sword -r2741
Apologies, some mis-sorted email caused me not to notice this until now. I had thought the problem you were seeing was a display issue, in which case recent engine updates had some good effects, which is why I suggested a more recent version...not realizing that you're a Win32 user, so all you've got is whatever came out in the latest Xiphos release build. (Though I'm a little surprised that your ref in the bug report mentions 2 different versions of Sword. Hm.) As others said, apparently the creation tools (as distinct from the processing engine used in apps) haven't been updated much lately, so the problem has to be either a deeper problem in those tools, or your encoding is what's actually in question. If anyone else might have some insight into what he's got going on, please see... http://sourceforge.net/p/gnomesword/bugs/491/ ...in which he included screenshots of what's wrong. Fundamentally, the problem faced is that Xiphos displays whatever the engine hands it. If the module is mis-constructed, or if the engine mis-processes it, Xiphos shows it wrong...and there's nothing to be done about it in Xiphos itself. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] seeking consensus on OSIS lemma best practice
On 10/12/2012 03:16 PM, Chris Little wrote: On 10/12/2012 1:40 PM, Daniel Owens wrote: Gary Holmlund and I are working on a problem related to the Westminster Hebrew Morphology (WHM) module. We need a consensus on markup practices for OSIS lemma. I was having a problem getting natural Hebrew lemma to look up an entry and display it in the mag window. Gary discovered that if "H" is prefixed to lemma in WHM, the BibleTime mag window works with Hebrew lemma (as opposed to Strong's numbers). My understanding is that this is not typical OSIS best practice but a SWORD convention. I resisted at first, but now I think there is some wisdom to using this method. We need some way to distinguish between Hebrew and Aramaic words, which can be identical in form but not in meaning. WHM uses @ for Hebrew and % for Aramaic. I suggested to Gary that we compromise and simply change @ to H and % to A, modifying BibleTime to strip A and H and use that to look for the entry in the correct lexicon. The markup would look like this: Hebrew (from Deuteronomy): תֹּאבֵדוּן֮ Aramaic (from Jeremiah): יֵאבַ֧דוּ The main problem I see is that other front-ends may not follow the process of looking for G or H and then stripping the character before looking up the entry. Could we come to a consensus on this? Could you confirm that this is the behavior in some front end other than BibleTime? From my perspective it just sounds like a BibleTime bug. This is certainly bad OSIS encoding. It is also not a Sword convention. If anything is implemented that requires a language prefix like this, it represents a bug, whether in Sword or in BibleTime. --Chris Here is a quote of a comment from Xiphos source code: Strong's words are specified as a prefix letter H or G (Hebrew or Greek) and the numeric word identifier, e.g. G2316 to find \"θεός\" (\"God\"). So it appears to use the H or G method. Is there is documentation about a better way to do this? Gary ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] seeking consensus on OSIS lemma best practice
Chris Little wrote: >> This is certainly bad OSIS encoding. It is also not a Sword >> convention. If anything is implemented that requires a language prefix >> like this, it represents a bug, whether in Sword or in BibleTime. Well...this is how SWModule has done this since forever. Gary Holmlund writes: > Here is a quote of a comment from Xiphos source code: > Strong's words are specified as a prefix letter H or G (Hebrew or > Greek) and the numeric word identifier, e.g. G2316 to find \"θεός\" > (\"God\"). Yes, that's from one of the help texts in Xiphos' advanced search. It simply reflects what has been the case since (what I have always perceived as) The Dawn Of Net.Time. See src/modules/swmodule.cpp, the description of case -3 before SWModule::search(). And then tell me what the 3 special cases are about, that have to do with noticing "G3588". (Love the comment: "cheeze. skip empty article tags that weren't assigned to any text". Hm.) ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] seeking consensus on OSIS lemma best practice
On 10/12/2012 07:23 PM, Karl Kleinpaste wrote: Chris Little wrote: This is certainly bad OSIS encoding. It is also not a Sword convention. If anything is implemented that requires a language prefix like this, it represents a bug, whether in Sword or in BibleTime. Well...this is how SWModule has done this since forever. Gary Holmlund writes: Here is a quote of a comment from Xiphos source code: Strong's words are specified as a prefix letter H or G (Hebrew or Greek) and the numeric word identifier, e.g. G2316 to find \"θεός\" (\"God\"). Yes, that's from one of the help texts in Xiphos' advanced search. It simply reflects what has been the case since (what I have always perceived as) The Dawn Of Net.Time. See src/modules/swmodule.cpp, the description of case -3 before SWModule::search(). And then tell me what the 3 special cases are about, that have to do with noticing "G3588". (Love the comment: "cheeze. skip empty article tags that weren't assigned to any text". Hm.) Type this into a shell in the sword/src directory. You will see plenty of evidence that sword is using H and G to parse strongs information. find -type f -name '*.cpp' |xargs grep -w H Gary ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] genbook lexicons - example problem and potential solutions
On 10/11/2012 6:39 PM, Daniel Owens wrote: I am still working on the Abbott-Smith markup project (over 300 entries and counting). We have four contributors right now, so the pace is picking up. Creating a module is another story. Chris made a lexicon module after the first release, but . . . I would like the module to look like this: http://www.textonline.org/files/abbott-smith/abbott-smith.current_release.html. To do that in SWORD, it needs to be a genbook in order to support: - front- and backmatter - page numbers - a hierarchical structure (In the original TEI it has at least one superEntry, but it is also divided into 's by letter heading [Α, Β, Γ, Δ, Ε, Ζ, Η, Θ, etc.]) The good news is that an OSIS genbook supports the bare-bones essentials of entries. And thankfully BPBible and BibleTime both display entries together in the same view, thanks to BPBible's continuous scrolling and *perhaps* BibleTime not recognizing . Unfortunately various features of valid OSIS genbooks are inconsistently supported by front-ends. I created a module for testing. You can find it at https://github.com/translatable-exegetical-tools/Abbott-Smith/tree/master/releases/sword, including a valid OSIS file. Issues include: - Some front-ends recognize , others , but the lexicon uses both (and both are valid OSIS) in various contexts. - Tables are inconsistently supported (mostly not) - Titles should be centered, but there is no way to do that in OSIS, as far as I can tell. I wonder if this is a great example use case of per-module CSS... - Parts of speech should be green and page numbers red, but you can't do color in OSIS (another use case of per-module CSS?) Some of these like , , and tables should just work, I think. Perhaps I will file bug reports. But the other display issues cannot be resolved by OSIS alone. Should TEI be a supported genbook format? I would think the TEI filter (as it evolves) could be pressed into use for genbooks. If that were done, certain lexicon-specific features as well as real book features such as page numbers could be consistently supported and displayed. On the other hand, I could see the value of having per-module CSS in the conf file so that the module developer could have some control over display. Any thoughts? I think your email boils down to wanting to use TEI for genbooks. You're absolutely welcome to do that, and there's nothing in the engine preventing you from doing that. There isn't currently an importer set up to parse TEI files and generate genbooks, but I would probably recommend writing a script to generate IMP files from TEI so that you have precise control over what goes into each leaf of the genbook tree. Down the road, xml2gbs will accommodate TEI. I started work on it a couple months ago, but haven't had the time to work on it seriously. --Chris ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] seeking consensus on OSIS lemma best practice
On 10/12/2012 7:23 PM, Karl Kleinpaste wrote: Chris Little wrote: This is certainly bad OSIS encoding. It is also not a Sword convention. If anything is implemented that requires a language prefix like this, it represents a bug, whether in Sword or in BibleTime. Well...this is how SWModule has done this since forever. Gary Holmlund writes: Here is a quote of a comment from Xiphos source code: Strong's words are specified as a prefix letter H or G (Hebrew or Greek) and the numeric word identifier, e.g. G2316 to find \"θεός\" (\"God\"). Yes, that's from one of the help texts in Xiphos' advanced search. It simply reflects what has been the case since (what I have always perceived as) The Dawn Of Net.Time. See src/modules/swmodule.cpp, the description of case -3 before SWModule::search(). And then tell me what the 3 special cases are about, that have to do with noticing "G3588". (Love the comment: "cheeze. skip empty article tags that weren't assigned to any text". Hm.) Strong's numbers are preceded by G or H to indicate language. Strong's numbers are specifically not at issue here. --Chris ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] usfm2osis.py and tag \cp
> Von: Chris Little > \cp (like \vp) is a workaround for a limitation in Paratext. Thanks, this was me being confused. > You should look into \ca or \cl as alternatives. Thanks. \cl is probably what I looked for. WIll see. Thanks, even more so, for fixing the bug/crash! Peter ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
[sword-devel] usfm2osis.py and crossreferences
Currently usfm2osis.py does not produce complete cross references. a) It translates the in the \xo tag contained origin reference as a http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] usfm2osis.py and crossreferences
On 10/12/2012 10:53 PM, Peter von Kaehne wrote: Currently usfm2osis.py does not produce complete cross references. a) It translates the in the \xo tag contained origin reference as a There's a roadmap in usfm2osis.py that includes reference parsing as a post-1.0 feature. At the present, usfm2osis.py is just a USFM to OSIS converter. Parsing references from USFM docs is outside that scope since references in USFM docs are completely unstandardized and the few facilities made available to allow reference parsing (\toc3) are infrequently used. I'd like to enable reference parsing (though I don't necessarily believe it can be done reliably), but I see it as a future feature, along with things like generating Sword modules directly--without osis2mod. --Chris ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] seeking consensus on OSIS lemma best practice
On 10/12/2012 1:40 PM, Daniel Owens wrote: The markup would look like this: Hebrew (from Deuteronomy): תֹּאבֵדוּן֮ Aramaic (from Jeremiah): יֵאבַ֧דוּ The main problem I see is that other front-ends may not follow the process of looking for G or H and then stripping the character before looking up the entry. Could we come to a consensus on this? I would recommend taking a look at the markup used in the MorphGNT module, which also employs real lemmata rather in addition to lemmata coded as Strong's numbers: Βίβλος You should begin the workID for real lemmata with "lemma.", and follow this with some identifier indicating the lemmatization scheme. We have some code in Sword that looks for "lemma." and will treat the value as a real word rather than a Strong's number or something else. I think OSIS validation may complain about the workIDs of the form "lemma.system", but that's a schema bug and you should ignore it. As for the value of the lemma itself ([HA]אבד in your example above), you choose the form specified in the system you are employing. So, if MORPH employs its own lemmatization system and that takes the form @ for Hebrew and % for Aramaic, then use those forms, e.g.: morph="whmmorph:some_value">תֹּאבֵדוּן֮ The alternative is to distinguish the languages via the workID: morph="whmmorph:some_value">תֹּאבֵדוּן֮ If you aren't creating a lexical resource that indexes based on @- and %- prefixed lemmata, then I don't see how the former option is useful and would recommend the latter. The latter option will allow lookups in word-indexed lexica. --Chris ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page