date:20121012

Re: [sword-devel] usfm2osis.py and tag \cp

2012-10-12 Thread Peter von Kaehne


> Von: Chris Little 

> I hope I've fixed this now. (I haven't tested that it functions 
> correctly, but the error was fairly obvious from the traceback below.)

Hi Chris,

Sorry, while the crash has gone, the function is not correct - at all.

\cp is meant to give a printed chapter number which has no influence on the 
underlying counting of verses and chapters. How exactly to represent it in 
OSIS, we would need to figure out, but it should not influence the creation of 
subsequent osisIDs. I would think  is probably the best for our 
purposes. The OSIS reference is not exactly helpful at this point, nor does it 
reflect the reality of module making.

Right now the code does two things: It replaces in the sample below the chapter 
number 1 with an A for the subsequent verse's osisID ("Esth.A.1" instead of 
"Esth.1.1") and it leaves the \cp A in place. This is both not right - both acc 
OSIS reference and acc the desires of the USFM writer in my example.

> > Following minimal USFM code creates below attached error message.
> >
> > \id EST
> > \h ESTER
> > \c 1
> > \cp A
> > \s En Mordekai eh Ouraman
> > \p
> > \v 1 Mordekai,
> >

Here is the currently generated OSIS:



http://www.bibletechnologies.net/2003/OSIS/namespace"; xmlns:xsi="h





ESTER

\cp A
En Mordekai eh Ouraman

Mordekai,






I think the best way of expressing above usfm should be something along 
following lines:


http://www.bibletechnologies.net/2003/OSIS/namespace"; xmlns:xsi="h





ESTER

A
En Mordekai eh Ouraman

Mordekai,




--

The alternative would be to use one of the messy constructions shown in 
Appendix I of the OSIS reference for two reference systems, but this is not 
only very ugly, but will fail to elicit any support in the engine, nor likely 
gain such support within the near, mid-term or long-term future.

Yours 

Peter

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] multiple languages in modules

2012-10-12 Thread Ben Morgan

G'day Karl,

On Fri, Oct 12, 2012 at 2:11 PM, Karl Kleinpaste wrote:

>
> > Is the  element passed through the engine? If so, do I need
> > to file bugs with front-ends to encourage support of ?
>
> Having just looked, the string "foreign" does not appear in Sword's
> source tree in src/modules/filters/*.cpp.  So it's not supported right
> now after all.  I don't know how BPBible supports it; I had understood
> that BPBible uses the regular filter sets.  Does BPBible actually
> subclass the filters and extend them for ?
>
BPBible doesn't support foreign. It only looks like it does.
What BPBible does support is automatically detecting Greek and Hebrew text
and marking it to be used with the configured Greek/Hebrew fonts.

Just for the record, BPBible does subclass the regular filters quite
substantially.
It uses it for things like:
poetic text display
strongs headwords instead of numbers (if option is on)
quote colouring by speaker in ESV (if option is on)
cross-reference expansion (if option is on)

as well as some HTML+class code so CSS can be applied
Probably some of the new XHTML filter will overlap with what BPBible is
doing with some of the basic html + classes it is writing out.

God bless,
Ben.
___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] multiple languages in modules

2012-10-12 Thread Daniel Owens

is the xml way of indicating a language other than the
language of the document. So you surround Hebrew text with xml:lang="heb">. Judging from Ben's more recent email, even BPBible does
not support it. Regardless of the menthod, the effect is great.

I use Linux Libertine all the time for all but Hebrew. Vowel points do
not display correctly. Free Serif is a passable alternative because it
gets the vowels right, but the Hebrew glyphs seem anemic to me.

I am also working with David Troidl on BDB, which has many more
languages, including Arabic, Ethiopic, Syriac, and transliterated
Akkadian. It is not realistic to expect any one font to handle all of
those in addition to Greek and Hebrew. My main point is that applying
fonts based on language of the text rather than language of the module
is something worth working in.

More below.

On 10/11/2012 11:11 PM, Karl Kleinpaste wrote:

I know nothing of , but can only suppose that, if supported, it
must pass through the engine with an appropriate (HTML) indication.

As a general rule, I suggest either Free Serif or Linux Libertine, with
a slight preference for Free Serif. Both have good coverage across
every Latin alphabet variant, and pretty display of both Hebrew and
Greek. In modules of mine that have Latin, Greek, and Hebrew alphabets,
they all show quite well. We include both of these fonts in Xiphos'
Win32 installers.

You might find the UDHR module useful, from Crosswire Experimental, as a
font demonstration module.

(Linux Libertine is not Linux-specific. It was just developed in an
open source environment.)

Is the element passed through the engine? If so, do I need
to file bugs with front-ends to encourage support of ?

Having just looked, the string "foreign" does not appear in Sword's
source tree in src/modules/filters/*.cpp. So it's not supported right
now after all. I don't know how BPBible supports it; I had understood
that BPBible uses the regular filter sets. Does BPBible actually
subclass the filters and extend them for ?

Second, when RtoL text is mixed with LtoR text you can get some
strange display problems. Punctuation and numbers can work for both
types of languages.

This is often an artifact of how toolkits handle LtoR. Today, Xiphos
uses GTK and WebKit, but I don't know how these reflect your example
case. Our former use of gtkhtml3 -vs- gtkmozembed -vs- xulrunner -vs-
today's WebKit always led to some strange realizations for how LtoR
would show up in Xiphos. gtkhtml3 wants to right-justify any text
containing (or perhaps it was "that leads off with") Hebrew. That
peculiarity led to certain unexpected choices for how I created
StrongsRealHebrew.

I love unicode, but mixed language language directions is one problem
that did not exist with legacy fonts. As far as I can tell, all web
browsers and word processors do the same thing—when you have some Hebrew
text they assume that anything that follows such as numerals or
punctuation (until you get some Latin text, for example) is Hebrew. When
marking up xml you get a false sense of security about text rendering
because the tags use Latin characters. But when they are rendered by a
browser, even text outside the is assumed to be
Hebrew until you get some Latin text. I think that is why html has
, which helps solve the problem.

Daniel

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] multiple languages in modules

2012-10-12 Thread Chris Little


On 10/12/2012 5:44 AM, Daniel Owens wrote:

 is the xml way of indicating a language other than the
language of the document. So you surround Hebrew text with .


A small sidenote, since you do encoding: "heb" is not a legal value for 
xml:lang. This must be "he" or "hbo" if you mean Ancient Hebrew. 
2-letter language subtags from 639-2 are always required if they exist 
(rather than 3-letter subtags from subsequent 639s). The details are 
spelled out in BCP 47. You can also find the full current set of IANA 
registered language subtags at:

http://www.iana.org/assignments/language-subtag-registry

--Chris


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] usfm2osis.py and tag \cp

2012-10-12 Thread Chris Little

On 10/12/2012 4:00 AM, Peter von Kaehne wrote:

Sorry, while the crash has gone, the function is not correct - at
all.

\cp is meant to give a printed chapter number which has no influence
on the underlying counting of verses and chapters. How exactly to
represent it in OSIS, we would need to figure out, but it should not
influence the creation of subsequent osisIDs. I would think is probably the best for our purposes. The OSIS
reference is not exactly helpful at this point, nor does it reflect
the reality of module making.

\cp (like \vp) is a workaround for a limitation in Paratext. Paratext
requires that all chapter and verse numbers be numeric and strictly
increasing. No lettered or out-of-order or repeated verse or chapter
numbers are permissible. However, actual Bibles sometimes include these
things. So Paratext requires that you enumerate the chapters/verses with
strictly increasing numerals. \cp and \vp let Paratext substitute the
correct underlying number when rendering.

The description of \cp in the USFM docs states: "This is a chapter
marker (number, letter) which would be displayed in the published text
(where the published marker is different than the \c # used within the
translation editing environment)." The words "translation editing
environment" are a reference to Paratext specifically, and the
description as a whole conveys that \cp is the real chapter number if a
different \c value is necessitated by Paratext.

OSIS doesn't have this limitation. You can encode the real verse and
chapter numbers in OSIS, without need for a workaround.

So usfm2osis.py's replacement of the numeric dummy-chapter with the
chapter number specified in \cp is correct.

If you look at your USFM document, I anticipate you see something like:

\c 1
\cp A
...
\c 2
\cp 1
...
\c 3
\cp 2
...
\c 4
\cp 3
...
\c 5
\cp B
...
\c 6
\cp 3
...
\c 7
\cp 4

The strictly increasing \c values are just dummy values for Paratext.
The \cp values represent the actual underlying chapter numbers for this
reference scheme. There aren't two different chapter 3s in Esther, just
one that is briefly interrupted by chapter B, but Paratext can't deal
with the underlying reference system, so it requires the \cp workaround.
Likewise, chapter 4 (\cp 4) isn't really chapter 7 (\c 7).

This is mostly based on my experience encoding USX docs for ABS. If your
USFM encoder intends that the value in \c be the chapter value, then \cp
should not be used. You should look into \ca or \cl as alternatives.

Right now the code does two things: It replaces in the sample below
the chapter number 1 with an A for the subsequent verse's osisID
("Esth.A.1" instead of "Esth.1.1") and it leaves the \cp A in place.
This is both not right - both acc OSIS reference and acc the desires
of the USFM writer in my example.

With the update just committed, usfm2osis.py should now correctly remove
\cp (and \vp). That was a bug--actually a set of bugs. Again, I
regrettably haven't tested this, but the code looks good to me.

--Chris

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

[sword-devel] seeking consensus on OSIS lemma best practice

2012-10-12 Thread Daniel Owens

Gary Holmlund and I are working on a problem related to the Westminster 
Hebrew Morphology (WHM) module. We need a consensus on markup practices 
for OSIS lemma.


I was having a problem getting natural Hebrew lemma to look up an entry 
and display it in the mag window. Gary discovered that if "H" is 
prefixed to lemma in WHM, the BibleTime mag window works with Hebrew 
lemma (as opposed to Strong's numbers).


My understanding is that this is not typical OSIS best practice but a 
SWORD convention. I resisted at first, but now I think there is some 
wisdom to using this method. We need some way to distinguish between 
Hebrew and Aramaic words, which can be identical in form but not in 
meaning. WHM uses @ for Hebrew and % for Aramaic. I suggested to Gary 
that we compromise and simply change @ to H and % to A, modifying 
BibleTime to strip A and H and use that to look for the entry in the 
correct lexicon.


The markup would look like this:

Hebrew (from Deuteronomy): morph="whmmorph:some_value">תֹּאבֵדוּן֮


Aramaic (from Jeremiah): morph="whmmorph:some_value">יֵאבַ֧דוּ


The main problem I see is that other front-ends may not follow the 
process of looking for G or H and then stripping the character before 
looking up the entry.


Could we come to a consensus on this?

Daniel

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] multiple languages in modules

2012-10-12 Thread Daniel Owens


On 10/12/2012 03:23 PM, Chris Little wrote:

On 10/12/2012 5:44 AM, Daniel Owens wrote:

 is the xml way of indicating a language other than the
language of the document. So you surround Hebrew text with .


A small sidenote, since you do encoding: "heb" is not a legal value 
for xml:lang. This must be "he" or "hbo" if you mean Ancient Hebrew. 
2-letter language subtags from 639-2 are always required if they exist 
(rather than 3-letter subtags from subsequent 639s). The details are 
spelled out in BCP 47. You can also find the full current set of IANA 
registered language subtags at:

http://www.iana.org/assignments/language-subtag-registry

--Chris


Okay, thanks. "heb" is more intuitive, so perhaps that is how it crept in.

Daniel

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] seeking consensus on OSIS lemma best practice

2012-10-12 Thread Chris Little


On 10/12/2012 1:40 PM, Daniel Owens wrote:

Gary Holmlund and I are working on a problem related to the Westminster
Hebrew Morphology (WHM) module. We need a consensus on markup practices
for OSIS lemma.

I was having a problem getting natural Hebrew lemma to look up an entry
and display it in the mag window. Gary discovered that if "H" is
prefixed to lemma in WHM, the BibleTime mag window works with Hebrew
lemma (as opposed to Strong's numbers).

My understanding is that this is not typical OSIS best practice but a
SWORD convention. I resisted at first, but now I think there is some
wisdom to using this method. We need some way to distinguish between
Hebrew and Aramaic words, which can be identical in form but not in
meaning. WHM uses @ for Hebrew and % for Aramaic. I suggested to Gary
that we compromise and simply change @ to H and % to A, modifying
BibleTime to strip A and H and use that to look for the entry in the
correct lexicon.

The markup would look like this:

Hebrew (from Deuteronomy): תֹּאבֵדוּן֮

Aramaic (from Jeremiah): יֵאבַ֧דוּ

The main problem I see is that other front-ends may not follow the
process of looking for G or H and then stripping the character before
looking up the entry.

Could we come to a consensus on this?


Could you confirm that this is the behavior in some front end other than 
BibleTime? From my perspective it just sounds like a BibleTime bug.


This is certainly bad OSIS encoding. It is also not a Sword convention. 
If anything is implemented that requires a language prefix like this, it 
represents a bug, whether in Sword or in BibleTime.


--Chris



___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Sword -r2741

2012-10-12 Thread Robert Hunt


On 12/10/12 16:43, luke wrote:

In recent correspondence with Karl Kleinpaste of the Xiphos project about 
display issues with our project's module.  He recommended that I try sword's 
latest -r2741 because it has recent changes regarding osis headings.  I do not 
have access to this version of sword.

Would someone be willing to run our project's osis file through the latest 
version of sword (apparently -r2741), create a module from it and then send me 
the results?
- My OSIS was builting using the sword script from USFM files.
- My OSIS validates
- I have already ran the fix for titles on my osis.

Please contact me if you are willing,
Thanks

Hi Luke,

Can't see any reply to your message here. As far as I can see, 
osis2mod.cpp hasn't changed since March and the latest revision is 2693. 
(Someone please correct me if I'm wrong.)


I suspect the abovementioned recent changes might be in the sword 
library processing of the module, not its creation.


Robert.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Sword -r2741

2012-10-12 Thread DM Smith

Osis2mod is not affected by any of the recent changes. If you built your module 
with an earlier, it'd be a good idea to build it with the most recent to test 
your content. I believe that the utility lookup will give you a view into how 
verses are stored and rendered.

In Him,
DM

On Oct 12, 2012, at 8:08 PM, Robert Hunt  wrote:

> On 12/10/12 16:43, luke wrote:
>> In recent correspondence with Karl Kleinpaste of the Xiphos project about 
>> display issues with our project's module.  He recommended that I try sword's 
>> latest -r2741 because it has recent changes regarding osis headings.  I do 
>> not have access to this version of sword.
>> 
>> Would someone be willing to run our project's osis file through the latest 
>> version of sword (apparently -r2741), create a module from it and then send 
>> me the results?
>> - My OSIS was builting using the sword script from USFM files.
>> - My OSIS validates
>> - I have already ran the fix for titles on my osis.
>> 
>> Please contact me if you are willing,
>> Thanks
> Hi Luke,
> 
>Can't see any reply to your message here. As far as I can see, 
> osis2mod.cpp hasn't changed since March and the latest revision is 2693. 
> (Someone please correct me if I'm wrong.)
> 
>I suspect the abovementioned recent changes might be in the sword library 
> processing of the module, not its creation.
> 
> Robert.
> 
> ___
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Sword -r2741

2012-10-12 Thread Karl Kleinpaste

Apologies, some mis-sorted email caused me not to notice this until now.

I had thought the problem you were seeing was a display issue, in which
case recent engine updates had some good effects, which is why I
suggested a more recent version...not realizing that you're a Win32
user, so all you've got is whatever came out in the latest Xiphos
release build.  (Though I'm a little surprised that your ref in the bug
report mentions 2 different versions of Sword.  Hm.)

As others said, apparently the creation tools (as distinct from the
processing engine used in apps) haven't been updated much lately, so the
problem has to be either a deeper problem in those tools, or your
encoding is what's actually in question.

If anyone else might have some insight into what he's got going on,
please see...
http://sourceforge.net/p/gnomesword/bugs/491/
...in which he included screenshots of what's wrong.

Fundamentally, the problem faced is that Xiphos displays whatever the
engine hands it.  If the module is mis-constructed, or if the engine
mis-processes it, Xiphos shows it wrong...and there's nothing to be done
about it in Xiphos itself.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] seeking consensus on OSIS lemma best practice

2012-10-12 Thread Gary Holmlund


On 10/12/2012 03:16 PM, Chris Little wrote:

On 10/12/2012 1:40 PM, Daniel Owens wrote:

Gary Holmlund and I are working on a problem related to the Westminster
Hebrew Morphology (WHM) module. We need a consensus on markup practices
for OSIS lemma.

I was having a problem getting natural Hebrew lemma to look up an entry
and display it in the mag window. Gary discovered that if "H" is
prefixed to lemma in WHM, the BibleTime mag window works with Hebrew
lemma (as opposed to Strong's numbers).

My understanding is that this is not typical OSIS best practice but a
SWORD convention. I resisted at first, but now I think there is some
wisdom to using this method. We need some way to distinguish between
Hebrew and Aramaic words, which can be identical in form but not in
meaning. WHM uses @ for Hebrew and % for Aramaic. I suggested to Gary
that we compromise and simply change @ to H and % to A, modifying
BibleTime to strip A and H and use that to look for the entry in the
correct lexicon.

The markup would look like this:

Hebrew (from Deuteronomy): תֹּאבֵדוּן֮

Aramaic (from Jeremiah): יֵאבַ֧דוּ

The main problem I see is that other front-ends may not follow the
process of looking for G or H and then stripping the character before
looking up the entry.

Could we come to a consensus on this?


Could you confirm that this is the behavior in some front end other 
than BibleTime? From my perspective it just sounds like a BibleTime bug.


This is certainly bad OSIS encoding. It is also not a Sword 
convention. If anything is implemented that requires a language prefix 
like this, it represents a bug, whether in Sword or in BibleTime.


--Chris

Here is a quote of a comment from Xiphos source code:

 Strong's words are specified as a prefix letter H or G (Hebrew or
 Greek) and the numeric word identifier, e.g. G2316 to find 
\"θεός\" (\"God\").


So it appears to use the H or G method. Is there is documentation about 
a better way to do this?


Gary
___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] seeking consensus on OSIS lemma best practice

2012-10-12 Thread Karl Kleinpaste

Chris Little wrote:
>> This is certainly bad OSIS encoding. It is also not a Sword
>> convention. If anything is implemented that requires a language prefix
>> like this, it represents a bug, whether in Sword or in BibleTime.

Well...this is how SWModule has done this since forever.

Gary Holmlund  writes:
> Here is a quote of a comment from Xiphos source code:
>  Strong's words are specified as a prefix letter H or G (Hebrew or
>  Greek) and the numeric word identifier, e.g. G2316 to find \"θεός\"
>  (\"God\").

Yes, that's from one of the help texts in Xiphos' advanced search.  It
simply reflects what has been the case since (what I have always
perceived as) The Dawn Of Net.Time.

See src/modules/swmodule.cpp, the description of case -3 before
SWModule::search().  And then tell me what the 3 special cases are
about, that have to do with noticing "G3588".

(Love the comment: "cheeze.  skip empty article tags that weren't
assigned to any text".  Hm.)

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] seeking consensus on OSIS lemma best practice

2012-10-12 Thread Gary Holmlund


On 10/12/2012 07:23 PM, Karl Kleinpaste wrote:

Chris Little wrote:

This is certainly bad OSIS encoding. It is also not a Sword
convention. If anything is implemented that requires a language prefix
like this, it represents a bug, whether in Sword or in BibleTime.

Well...this is how SWModule has done this since forever.

Gary Holmlund  writes:

Here is a quote of a comment from Xiphos source code:
  Strong's words are specified as a prefix letter H or G (Hebrew or
  Greek) and the numeric word identifier, e.g. G2316 to find \"θεός\"
  (\"God\").

Yes, that's from one of the help texts in Xiphos' advanced search.  It
simply reflects what has been the case since (what I have always
perceived as) The Dawn Of Net.Time.

See src/modules/swmodule.cpp, the description of case -3 before
SWModule::search().  And then tell me what the 3 special cases are
about, that have to do with noticing "G3588".

(Love the comment: "cheeze.  skip empty article tags that weren't
assigned to any text".  Hm.)

Type this into a shell in the sword/src directory. You will see plenty 
of evidence that sword is using H and G to parse strongs information.


   find -type f -name '*.cpp' |xargs grep -w H

Gary


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] genbook lexicons - example problem and potential solutions

2012-10-12 Thread Chris Little


On 10/11/2012 6:39 PM, Daniel Owens wrote:

I am still working on the Abbott-Smith markup project (over 300 entries
and counting). We have four contributors right now, so the pace is
picking up. Creating a module is another story. Chris made a lexicon
module after the first release, but . . .

I would like the module to look like this:
http://www.textonline.org/files/abbott-smith/abbott-smith.current_release.html.
To do that in SWORD, it needs to be a genbook in order to support:
- front- and backmatter
- page numbers
- a hierarchical structure (In the original TEI it has at least one
superEntry, but it is also divided into 's by letter heading [Α, Β,
Γ, Δ, Ε, Ζ, Η, Θ, etc.])

The good news is that an OSIS genbook supports the bare-bones essentials
of entries. And thankfully BPBible and BibleTime both display entries
together in the same view, thanks to BPBible's continuous scrolling and
*perhaps* BibleTime not recognizing .

Unfortunately various features of valid OSIS genbooks are inconsistently
supported by front-ends. I created a module for testing. You can find it
at
https://github.com/translatable-exegetical-tools/Abbott-Smith/tree/master/releases/sword,
including a valid OSIS file. Issues include:
- Some front-ends recognize , others , but the lexicon uses both
(and both are valid OSIS) in various contexts.
- Tables are inconsistently supported (mostly not)
- Titles should be centered, but there is no way to do that in OSIS, as
far as I can tell. I wonder if this is a great example use case of
per-module CSS...
- Parts of speech should be green and page numbers red, but you can't do
color in OSIS (another use case of per-module CSS?)

Some of these like , , and tables should just work, I think.
Perhaps I will file bug reports. But the other display issues cannot be
resolved by OSIS alone.

Should TEI be a supported genbook format? I would think the TEI filter
(as it evolves) could be pressed into use for genbooks. If that were
done, certain lexicon-specific features as well as real book features
such as page numbers could be consistently supported and displayed. On
the other hand, I could see the value of having per-module CSS in the
conf file so that the module developer could have some control over
display.

Any thoughts?


I think your email boils down to wanting to use TEI for genbooks. You're 
absolutely welcome to do that, and there's nothing in the engine 
preventing you from doing that.


There isn't currently an importer set up to parse TEI files and generate 
genbooks, but I would probably recommend writing a script to generate 
IMP files from TEI so that you have precise control over what goes into 
each leaf of the genbook tree. Down the road, xml2gbs will accommodate 
TEI. I started work on it a couple months ago, but haven't had the time 
to work on it seriously.


--Chris


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] seeking consensus on OSIS lemma best practice

2012-10-12 Thread Chris Little


On 10/12/2012 7:23 PM, Karl Kleinpaste wrote:

Chris Little wrote:

This is certainly bad OSIS encoding. It is also not a Sword
convention. If anything is implemented that requires a language prefix
like this, it represents a bug, whether in Sword or in BibleTime.


Well...this is how SWModule has done this since forever.

Gary Holmlund  writes:

Here is a quote of a comment from Xiphos source code:
  Strong's words are specified as a prefix letter H or G (Hebrew or
  Greek) and the numeric word identifier, e.g. G2316 to find \"θεός\"
  (\"God\").


Yes, that's from one of the help texts in Xiphos' advanced search.  It
simply reflects what has been the case since (what I have always
perceived as) The Dawn Of Net.Time.

See src/modules/swmodule.cpp, the description of case -3 before
SWModule::search().  And then tell me what the 3 special cases are
about, that have to do with noticing "G3588".

(Love the comment: "cheeze.  skip empty article tags that weren't
assigned to any text".  Hm.)


Strong's numbers are preceded by G or H to indicate language. Strong's 
numbers are specifically not at issue here.


--Chris


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] usfm2osis.py and tag \cp

2012-10-12 Thread Peter von Kaehne


> Von: Chris Little 

> \cp (like \vp) is a workaround for a limitation in Paratext.

Thanks, this was me being confused. 

> You should look into \ca or \cl as alternatives.

Thanks. \cl is probably what I looked for. WIll see. 

Thanks, even more so, for fixing the bug/crash!

Peter


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

[sword-devel] usfm2osis.py and crossreferences

2012-10-12 Thread Peter von Kaehne

Currently usfm2osis.py does not produce complete cross references.

a) It translates the in the \xo tag contained origin reference as a http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] usfm2osis.py and crossreferences

2012-10-12 Thread Chris Little


On 10/12/2012 10:53 PM, Peter von Kaehne wrote:

Currently usfm2osis.py does not produce complete cross references.

a) It translates the in the \xo tag contained origin reference as a


There's a roadmap in usfm2osis.py that includes reference parsing as a 
post-1.0 feature. At the present, usfm2osis.py is just a USFM to OSIS 
converter. Parsing references from USFM docs is outside that scope since 
references in USFM docs are completely unstandardized and the few 
facilities made available to allow reference parsing (\toc3) are 
infrequently used.


I'd like to enable reference parsing (though I don't necessarily believe 
it can be done reliably), but I see it as a future feature, along with 
things like generating Sword modules directly--without osis2mod.


--Chris

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] seeking consensus on OSIS lemma best practice

2012-10-12 Thread Chris Little


On 10/12/2012 1:40 PM, Daniel Owens wrote:

The markup would look like this:

Hebrew (from Deuteronomy): תֹּאבֵדוּן֮

Aramaic (from Jeremiah): יֵאבַ֧דוּ

The main problem I see is that other front-ends may not follow the
process of looking for G or H and then stripping the character before
looking up the entry.

Could we come to a consensus on this?


I would recommend taking a look at the markup used in the MorphGNT 
module, which also employs real lemmata rather in addition to lemmata 
coded as Strong's numbers:


Βίβλος


You should begin the workID for real lemmata with "lemma.", and follow 
this with some identifier indicating the lemmatization scheme. We have 
some code in Sword that looks for "lemma." and will treat the value as a 
real word rather than a Strong's number or something else. I think OSIS 
validation may complain about the workIDs of the form "lemma.system", 
but that's a schema bug and you should ignore it.


As for the value of the lemma itself ([HA]אבד in your example above), 
you choose the form specified in the system you are employing. So, if 
MORPH employs its own lemmatization system and that takes the form 
@ for Hebrew and % for Aramaic, then use those forms, e.g.:


 morph="whmmorph:some_value">תֹּאבֵדוּן֮

The alternative is to distinguish the languages via the workID:

 morph="whmmorph:some_value">תֹּאבֵדוּן֮

If you aren't creating a lexical resource that indexes based on @- and 
%- prefixed lemmata, then I don't see how the former option is useful 
and would recommend the latter. The latter option will allow lookups in 
word-indexed lexica.


--Chris


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] usfm2osis.py and tag \cp

Re: [sword-devel] multiple languages in modules

Re: [sword-devel] multiple languages in modules

Re: [sword-devel] multiple languages in modules

Re: [sword-devel] usfm2osis.py and tag \cp

[sword-devel] seeking consensus on OSIS lemma best practice

Re: [sword-devel] multiple languages in modules

Re: [sword-devel] seeking consensus on OSIS lemma best practice

Re: [sword-devel] Sword -r2741

Re: [sword-devel] Sword -r2741

Re: [sword-devel] Sword -r2741

Re: [sword-devel] seeking consensus on OSIS lemma best practice

Re: [sword-devel] seeking consensus on OSIS lemma best practice

Re: [sword-devel] seeking consensus on OSIS lemma best practice

Re: [sword-devel] genbook lexicons - example problem and potential solutions

Re: [sword-devel] seeking consensus on OSIS lemma best practice

Re: [sword-devel] usfm2osis.py and tag \cp

[sword-devel] usfm2osis.py and crossreferences

Re: [sword-devel] usfm2osis.py and crossreferences

Re: [sword-devel] seeking consensus on OSIS lemma best practice

20 matches

Site Navigation

Mail list logo

Footer information