Re: [sword-devel] XSLT vs. C++

2010-12-06 Thread David Hollands
Amen!

http://hixie.ch/advocacy/xslt

Love in Christ,

David

On 30 November 2010 19:08, Troy A. Griffitts  wrote:
> Having finally returned from a hectic 2 weeks of conferences, and lots
> to do before leaving for Christmas, I'm not sure I'm up for a heated,
> passionate debate about technologies right now, but by all means, please
> commence the public discussion.
>
> Let me start by saying that everyone (I believe) agrees that we would
> like to have an HTML output from the engine which is more generic and
> would allow CSS to be applied if a frontend would like to do this.
> Currently HTMLHREF output from the engine is used by the widest number
> of frontends (to my knowledge) and would benefit everyone involved by
> becoming much more generic. e.g.,
>
>  -> 
> rather than
>  -> 
>
>  -> 
> rather than
>  -> 
>
> etc.
>
> I believe this will solve a number of issues and possibly get the BT and
> MacSword teams onboard to using the same HTML output filters as the
> other projects involve (or at least subclassing them and using the
> majority of their functionality).
>
>
> Now, as to the other issue of using XSLT internally in the engine to
> process OSIS -> HTML
>
> I will throw a few melons into the air for target practice, and let the
> shooting commence.
>
> _
> *Multiple Language*
>
> XSLT is a programming language in the same sense that C++ is a
> programming language.
>
> The SWORD Project C++ engine is written in C++.  It is not a Python
> engine; it is not a Perl engine; it is not a Java engine; it is C++.
>
> One might say, "Well, you can use XSLT from C++.  Doesn't JSword do this
> from Java?"  Well, yes, of course you can, and DM can comment, if he
> feels the desire to recommend his decision to encorporate an XSLT engine
> into the JSword logic flow.  But simply because one CAN doesn't mean one
> SHOULD.  We COULD encorporate a Perl text processing engine in our C++
> code, or an Awk processing engine...  that doesn't mean we SHOULD.  I'm
> sure some would say we SHOULD.  And obviously DM has thought he SHOULD
> encorporate XSLT processing for JSword, so I'm not intending to say it
> is a BAD decision, just that it is not a decision I would make; in the
> same way as our projects each chose C++ vs. Java to implement our objective.
>
> ___
> *XSLT better than C++*
>
> One might say, "well, XSLT is better suited to process XML than C++."
> That's a loaded and unquantified statement.
>
> Certainly the C++ language specification doesn't include facilities to
> easily process XML, but that doesn't mean a plethora of C++ libraries
> don't exists for assisting in this task.
>
> The SWORD engine includes classes like XMLTag and SWBasicFilter which
> implement a SAX processing model.
>
> The current filters do not all use SWBasicFilter, nor XMLTag.  They've
> been written over 15 years and many before these classes existed.  Some
> are ugly and need to be rewritten for readability, certainly.  But not
> necessarily in a different programming language.
>
> 
> *COMPLEXITY*
>
> The task of enumerating all types of OSIS  tags, and deciding
> what to do with each, and how to classify all  tags from all
> possible OSIS documents into our enumeration is still going to be a
> complex task using XSLT.   is a complex example, but certainly
> not the most complex.
>
> It is a tall task to generalize all elements of all documents from all
> publishers into one conceptual model with one chosen output for a
> frontend-- whether that be for an audience on the Desktop, web-based, or
> a handheld.
>
> The complex processing required by the engine will require long, complex
> XSLT-- which likely will encorporate callbacks to C++.  It will not be
> more simple-- only mixed language.
> ___
> *Semantic vs. Display*
>
> Some will say (and have), "well, let everything be display oriented and
> let the publisher decide".  Fine, then you lose 2 things: the ability to
> display differently per user preference, per display device; and you
> also give up the promise to actually do any interesting research on the
> text.  When you lose semantic markup, then you lose all interesting
> information about WHAT is being marked up.
>
> ___
> *More than a Rending Engine*
>
> The SWORD C++ Engine is more than simply a text rendering engine-- it is
> a Biblical text research engine.
>
> If I'd like to know the morphology of word 3 in 2Thes 2.13 of the WHNU
> Greek text, the entire program to do such is:
>
> SWMgr library;
> SWModule *whnu = library.getModule("WHNU");
> whnu->setKey("2th.2.13");
> whnu->RenderText();
>
> cout << "The morphology of word three is: " <<
> whnu->getEntryAttributes()["Word"]["003"]["Morph"] << endl;
>
>
> That reads nice (at least in my opinion).  I don't need to know about
> XML, XSLT, care what markup the WHNU module uses, I don't even have to
> know how to make a SWORD filter.  The current filters

Re: [sword-devel] XSLT vs. C++

2010-12-01 Thread DM Smith
Not so much regarding Troy's comment about Plato's Form. Rather about 
the model that JSword uses. It is meant for illumination.


JSword converts ThML, GBF, PlainText and OSIS on a verse by verse basis 
into well-defined fragments of XML. These fragments use the tags of 
OSIS, but might not produce a valid fragment. For ease of explanation, 
we say that it is converted into OSIS. If for some reason a verse in 
ThML or OSIS is not well-formed, it is hacked by successively stripping 
out xml parts until it parses or until only the text remains. This hack 
is rather unfortunate and should be removed or improved. E.g. notes and 
xrefs should never be inlined as plain text if they are marked up properly.


Though it can, JSword does not use XSLT on a verse by verse basis to 
render a verse. Rather it gathers all the verses as XML fragments into 
an XML document. Typically this is a chapter of verses, but it might 
also be the set of verses returned from a search result, specified by 
the user, or given as a cross-reference. JSword will also collect verses 
from several modules into the document for parallel display.


It is this document that is rendered. How this document is rendered is 
up to the application. It could use SAX. It could walk the DOM. But 
Bible Desktop uses XSLT and many other JSword front-ends do so as well. 
In answer to an earlier question, the XSLT is read once and reused for 
all rendering of modules. It is way to expensive to do this frequently. 
Once per run or only when the underlying file changes is sufficient.


An aspect that JSword dictates on a processor of the document. All 
rendering/filtering happens within it. The BD style sheet is 
parametrized for each render option. Using these it shows/hides notes, 
xrefs, strongs, and morph; does verse per line; changes in the 
representation of the verse number; and so forth.


There are several values in rendering a chapter as a whole. There are 
many constructs that can include more than one verse. One can start a 
tag in the middle of one verse and close it in another. If one only 
rendered verse-by-verse the start and end might not be matched up 
correctly. For example, SWORD's osishtmlhref filter has a quote stack 
and a highlight stack. If a quote starts in one verse and ends in 
another, the stack is reset going from one verse to another. So the 
quote marks might not match up. (Note: osis2mod is aware of this 
shortcoming and adjusts for it. However, if the module maker uses 
imp2mod or vpl2mod it can happen). For the  tag when an opening tag 
is found, it is pushed on a stack (allowing for nesting). When an end 
tag is found, the stack is consulted to see what it was the start tag 
was. If it were bold then it closes bold, otherwise it closes italics. 
However, if the stack is empty, it closes italics.


This spanning problem affects JSword's rendering of a collection of 
arbitrary verses. A tag can be open in one verse, but because the verse 
is not show in context, it is never closed.


There is also an advantage of using XSLT over SAX, it is not limited to 
a single pass of the document. For example, this is used in Bible 
Desktop to show margin notes.


Regarding TEI, JSword pretends it is OSIS. This is not a far stretch 
since OSIS was influenced by TEI. The XSLT has a few entries to be able 
to display key elements. Since TEI is rather open, and in flux, not all 
of what we will use will be found in it. I haven't looked at it but 
Chris has a TEI schema he uses for validation. That could be used to 
improve the XSLT or for TEI modules to have their own XSLT.


Regarding ThML, JSword would do well to not convert it to OSIS but have 
XSLT for it as well.


Regarding the speed of XSLT vs SAX vs SWORDs renderers. Except for 
handhelds (pda, phone, ...) it is a moot point. I figure that 5-6 years 
is the maximum useful lifespan of a computer. The processing power of a 
computer in these years, even a netbook, is sufficient to run XSLT fast 
enough over a chapter's worth of verses to satisfy end users. I have an 
old 486, Windows 98 laptop with limited memory that runs it acceptably. 
Even my OLPC (one laptop per child) is fast enough.


Beyond JSword and how it could be used in SWORD with out much change to 
the current library:
I'm not sure, but I think any SWORD front-end can try out XSLT if they 
like on OSIS documents using the osisosis.cpp filter. The filter does 
not attempt to do too much except reconstruct verses. It might need to 
be modified to output milestoned verse markers instead of the begin/end 
tags it does now. Using begin/end tags makes the assumption that a verse 
is a well-formed fragment. Just use it to "render" a chapter and then 
pass that chapter to xslt.


I'm hearing that lots of people won't seriously look at XSLT. It has a 
steep but short learning curve. Kind of like Perl. There are two basic 
programming models using XSLT: one that understands the containment 
model of the schema. The other handl

Re: [sword-devel] XSLT vs. C++

2010-12-01 Thread Peter von Kaehne
> So, if one were to write a new OSIS filter from scratch,   > I'd like to know 
> what has to be done to meet/match SWORD's ideal chair.

I think the single most relevant requirement is easy expandability.

I am looking - totally different example - at usfm2osis.pl and about 180 or so 
lines are now mine. But I would not have been able to do anything with it in 
its old form. Since Chris rewrote it, commented it extensively and made it 
alotgether a logical affair it is so easy to add whenever I find a USFM tag 
unsupported/poorly supported to just ad a line or ten.

If the filtyers were ever at that place - I would be delighted.
-- 
GRATIS! Movie-FLAT mit über 300 Videos. 
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] XSLT vs. C++

2010-12-01 Thread Ben Morgan
Hi Greg,

On Thu, Dec 2, 2010 at 2:19 AM, Greg Hellings wrote:

> On Wed, Dec 1, 2010 at 8:13 AM, Jonathan Morgan 
> wrote:
> > Speaking as a BPBible developer, I would tend to prefer C++ filters to
> > XSLT.  Here are some reasons why:
> > 1. It works now (well, OK, it doesn't always work as well as one might
> like,
> > but it does work).
>
> It works for our historical collection of modules, but the current
> implementations of some of the filters are rigid and very difficult to
> update or modify.  But yes, it more or less works now.
>
I agree it can be very fiddly and fragile - that's mostly the filters like
the headings filters which are run before render; the OSISHtmlHref filters
are simple enough to work with. Extending it in python once it is set up is
very easy as well (basically defining a start_ and end_ handler -
for our handling of poetic lines, for example, see
http://code.google.com/p/bpbible/source/browse/branches/webconnect/backend/osisparser.py#475
)


> > 2. It is (fairly) readily able to be customised by application developers
> > using the magic of inheritance.  BPBible at least takes advantage of
> this,
> > and 0.4.7 contained about 800 lines of Python in our filter code.  For
> 0.5
> > the OSIS filter has doubled in size.  By contrast, if we were to maintain
> an
> > app-specific XSLT file, we would probably need to duplicate the whole
> file
> > and then make changes to it, and any changes made to the base XSLT file
> > would have to be manually merged in.  Bye-bye to the idea of having only
> one
> > lot of library source to maintain.
>
> XSLT is easily extensible.  SAX is easily extensible.
>
Basically what is used already is a SAX-like model, just implemented by
Sword. Customizability is just the same as you describe.

I do not believe XSLT is a good option; for a start, it requires (AFAIK)
valid XML fragments, which we do not have within a verse in much of existing
content (or even at all necessarily). JSword I believe has fallbacks to
extract the text if not valid xml, but I would far prefer not to use such
hacks; SWORD can handle this quite well (as probably SAX could if
non-validating). Also, due to the structure of OSIS with multiple
hierarchies, however you process it it will be complicated and this loses
much of the benefits of XSLT. (Disclaimer - never used XSLT)

Also, on a personal level, due to having never used XSLT, I feel comfortable
using Python/C++ whereas XSLT is scary.

God Bless,
Ben
---
Multitudes, multitudes,
in the valley of decision!
For the day of the LORD is near
in the valley of decision.

Giôên 3:14 (ESV)
___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] XSLT vs. C++

2010-12-01 Thread Greg Hellings
On Wed, Dec 1, 2010 at 8:13 AM, Jonathan Morgan  wrote:
> Speaking as a BPBible developer, I would tend to prefer C++ filters to
> XSLT.  Here are some reasons why:
> 1. It works now (well, OK, it doesn't always work as well as one might like,
> but it does work).

It works for our historical collection of modules, but the current
implementations of some of the filters are rigid and very difficult to
update or modify.  But yes, it more or less works now.

>
> 2. It is (fairly) readily able to be customised by application developers
> using the magic of inheritance.  BPBible at least takes advantage of this,
> and 0.4.7 contained about 800 lines of Python in our filter code.  For 0.5
> the OSIS filter has doubled in size.  By contrast, if we were to maintain an
> app-specific XSLT file, we would probably need to duplicate the whole file
> and then make changes to it, and any changes made to the base XSLT file
> would have to be manually merged in.  Bye-bye to the idea of having only one
> lot of library source to maintain.

XSLT is easily extensible.  SAX is easily extensible.

In XSLT I can import another XSL file and provide overrides - no need
to merge in changes from someone else and maintain identical copies,
etc.  When I'm creating my current set of modules I have 2 XSL files
that go from the proprietary SGML to HTML and ThML.  Obviously there
is a lot of overlap between those two.  The ThML stylesheet simply
imports the HTML stylesheet and overrides a few of the templates to
produce  and other ThML-specific elements.  That way, if
there is a bug in how I translate a table display, for instance, I can
change it in the HTML stylesheet and I get the fix for free in my ThML
without touching anything.

SAX is simply an API in any desired language.  If I want to override
the behavior of a single element, I just override the processing
method and check something like
if(is element to override)
doOverride();
else
callSuperclassMethod();
All the discussion for XSL above applies to SAX processing as well.

>
> 3. It allows developers to use sources that are outside the document being
> transformed.  This has had some issues for us (from memory, the filter code
> isn't re-entrant), but we use this functionality to do things like expanding
> a list of cross-references in the user's locale, looking up the headwords
> for Strong's Numbers, and looking up the text in the current version for a
> passage in a harmony.  By contrast, unless we have some good way to call
> into C++/Python from XSLT we will not be able to use sources outside the
> current document unless we do some complex post-processing.  If we do have
> such a way it could just increase complexity.

A SAX model, of course, is able to handle the full range of what your
programming language of choice has, so you're all set there.

XSL has many ways of bringing in data from the outside.  Arguements
and variables can be passed in by the caller (man xsltproc and you'll
see the argument --param PARAMNAME PARAMVALUE. Programmatic invocation
of XSL can use the same parameter mechanism), values can be pulled out
of static XML files which the XSL can include, and there is a rather
straightforward way of creating custom functions in your invoking
language.  When I am using XSL to parse my SGML files, I have a number
of custom functions written in Python which I invoke from XSL to do
any type of processing raw XSL can't handle (example: transforming
inline RTF styles into inline CSS styles).

Increasing complexity? That really depends on the methods used and
whether they are appropriate.

>
> 4. It allows us to share common functionality between the ThML filters and
> the OSIS filters (which we do).  I think this proposal would have us still
> using C++ ThML filters while moving the OSIS filters to XSLT, which would
> make the results further apart.

The same can be done with XSL simply be factoring the shared
functionality into a single stylesheet which the ThML and
OSIS-specific stylesheets include.

SAX... well, I think you get the idea there.

>
> 5. I would be concerned if performance dropped at all, as I suspect it would
> (especially if calls into C++ were involved as well).

Calls into the parent language don't really slow down XSL unless they
invoke a method which is excruciatingly slow.  Of course, no one
really has implementations of both technologies currently in place for
us to compare SWORD's performance at present.  apt-cache showpkg
libxml2 shows me around 1000 libraries and applications in Ubuntu
which currently depend directly on libxml2 including things as diverse
as Pidgin, PHP, Postgres, VMWare, rpm2html, strigi, nautilus, Gnome,
gstreamer, abiword, xscreensaver and so on.  Performance of that
library is apparently good enough for some people.  There are even two
sets of bindings in Python (python-libxml2 and python-lxml) both of
which I have used with great success.

To give an idea of how quickly it processes, I am able to load, par

Re: [sword-devel] XSLT vs. C++

2010-12-01 Thread DM Smith
I like Plato's Chair analogy. But not the conclusions drawn from it.

I think we all agree that some level of structural markup is necessary to 
identify: books, chapters, verses, titles, intros, words of Christ, footnotes, 
cross-references, and anything else we might want to treat specially beyond 
just presentation.

I like deep structural markup that goes beyond what we currently use, e.g. 
markup of names and place names, so that we are not limited by what we have 
done, but what we can envision later.

Some structural markup, such as poetry markup, today is used as merely 
presentational. As a result, it often is not structurally meaningful. This is a 
problem of the module maker creating something that looks nice but of which 
there is no value to software processing (e.g. getNextPoetryBlock() just won't 
get the desired results.

The problem with the Plato's Chair analogy is that SWORD is not merely an idea, 
but *an* implementation of that chair. I'd say it looks rather like a 1980's 
dinette chair constructed of steel tubing and vinyl cushions.

The biggest problem I see with the modules and the filters is that they are 
lossy and/or incomplete. I'll keep my remarks to the OSIS process as that is 
what I am most familiar, and since it is *a* chair, it is not too far removed 
from ThML's chair.

Regarding the modules, of necessity, we transform BSP OSIS (aka Book, Section, 
Paragraph with verse markers) into BCV (Book, Chapter, Verse) without verse 
markers. (ThML, GBF, PlainText readily lend themselves to BCV directly. I'm 
going to guess that is *a* major motivation for ThML.)

The purpose of osis2mod is to transform the publishers' chairs into SWORD's 
chairs. The shortcoming of using IMP or VPL to import OSIS (or any other module 
type) is that it bypasses such a transformation and puts the burden on the 
module maker to construct SWORD's chair directly.

Regarding the filters, there is an agreement that they need help. The problem 
with the OSIS to HTML filters is that they are not written to display what is 
defined by the OSIS spec, but only what the filter author thought was 
important. Some examples: OSIS allows for a title to be within a title, that 
is, to have sub-titles. OSIS allows rich markup within titles, such as 
footnotes, cross-references, divine name, etc. OSIS allows for significant 
content between verses. Words of Christ in verses can be punctuated by other 
words. These were or are problematic to these filters.

The second problem with these filters is that they are lossy. The filters only 
look for a subset of the OSIS tags and attributes. Examples: the "n" attribute 
on footnotes. Of the various types of  bold is handled well, but everything 
else gets italic (line-through, acrostic, illuminated, small-caps, sub, super). 
Table, row and cell are ignored (these could easily be in genbooks). And lots 
more

This is a community effort and we all have different skill sets. I'm 
particularly weak in doing C++ coding as I have been away from it for too long 
(I started with C++ 1.0 and moved to something else just before 3.0 was 
released). Otherwise, I'd have tackled the lossy-ness of the filters.

As I look at the code, the essential part of the SWORD chair seems to be how it 
pulls out of line various components into easily addressed structures: titles, 
footnotes, . I've tried but I don't understand this at all.

Within the osishtmlhref filter there are various notions that are necessary to 
understand but are entirely baffling to me: suspendTextPassThru, suspendLevel, 
lastSuspendSegment, supressAdjacentWhitespace, , .

So, if one were to write a new OSIS filter from scratch, I'd like to know what 
has to be done to meet/match SWORD's ideal chair.

In Him,
DM




On Dec 1, 2010, at 7:20 AM, Troy A. Griffitts wrote:

> The logic to get from any Publisher Source Document to rendered HTML is
> a very complex task to solve.
> 
> We conceptually create Plato's Form of, say, a Bible, and try to fit
> imperfect Publisher markup into this concept.  A Bible has verses,
> headings between verses, chapter intros, footnotes, crossrefs, lemma
> information, etc.
> 
> If we do not do this, then we become a PDF reader-- there are already
> PDF readers and we lose the ability to do Bible specific things with our
> software.  For example, if we didn't normalize the concept of crossref
> across all Books, then we couldn't turn them on and off; we couldn't
> provide a crossref panel in the reader which fills according to which
> crossref is hovered over, etc.  Same with notes, strongs, headings, etc.
> 
> This causes us to impose our Form onto a publisher's text.  I understand
> why some people may not like this, but it is very much to our end users'
> benefit that we do this.  Without this, we become a web-browser or a PDF
> reader.  Which are fine for their purpose, but we intend to provide
> common, familiar, and sometimes novel Bible study aides to our reader.
> 
> The c

Re: [sword-devel] XSLT vs. C++

2010-12-01 Thread Jonathan Morgan
Speaking as a BPBible developer, I would tend to prefer C++ filters to
XSLT.  Here are some reasons why:
1. It works now (well, OK, it doesn't always work as well as one might like,
but it does work).

2. It is (fairly) readily able to be customised by application developers
using the magic of inheritance.  BPBible at least takes advantage of this,
and 0.4.7 contained about 800 lines of Python in our filter code.  For 0.5
the OSIS filter has doubled in size.  By contrast, if we were to maintain an
app-specific XSLT file, we would probably need to duplicate the whole file
and then make changes to it, and any changes made to the base XSLT file
would have to be manually merged in.  Bye-bye to the idea of having only one
lot of library source to maintain.

3. It allows developers to use sources that are outside the document being
transformed.  This has had some issues for us (from memory, the filter code
isn't re-entrant), but we use this functionality to do things like expanding
a list of cross-references in the user's locale, looking up the headwords
for Strong's Numbers, and looking up the text in the current version for a
passage in a harmony.  By contrast, unless we have some good way to call
into C++/Python from XSLT we will not be able to use sources outside the
current document unless we do some complex post-processing.  If we do have
such a way it could just increase complexity.

4. It allows us to share common functionality between the ThML filters and
the OSIS filters (which we do).  I think this proposal would have us still
using C++ ThML filters while moving the OSIS filters to XSLT, which would
make the results further apart.

5. I would be concerned if performance dropped at all, as I suspect it would
(especially if calls into C++ were involved as well).

6. Currently our rendering works on a verse-by-verse basis.  I'm not sure
what it would look like if we were trying to do something like a chapter at
once.  Do we run through the chapter in one go?  What kind of well formed
OSIS document can we get from a single verse or collection of verses to pass
into an XSLT?  Is there much cost to fire up an XSLT engine just for the one
verse we have in our search preview?  What would you do if you wanted to
have a discontinuous range of verses or to show versions in parallel
verse-by-verse?  We also surround each verse and a rendered section with
other extra stuff which varies depending on the context.  I'm not sure where
this would fit in the XSLT (if at all).

In short, as a BPBible developer I much prefer implementation in C++ because
it allows us to do things we want to do much more easily than with XSLT
(though if Troy or anyone else wants to improve the present implementation
they are welcome to).  I cannot speak for the pros and cons from a module
creator point of view.

Jon

On Wed, Dec 1, 2010 at 6:08 AM, Troy A. Griffitts wrote:

> Having finally returned from a hectic 2 weeks of conferences, and lots
> to do before leaving for Christmas, I'm not sure I'm up for a heated,
> passionate debate about technologies right now, but by all means, please
> commence the public discussion.
>
> Let me start by saying that everyone (I believe) agrees that we would
> like to have an HTML output from the engine which is more generic and
> would allow CSS to be applied if a frontend would like to do this.
> Currently HTMLHREF output from the engine is used by the widest number
> of frontends (to my knowledge) and would benefit everyone involved by
> becoming much more generic. e.g.,
>
>  -> 
> rather than
>  -> 
>
>  -> 
> rather than
>  -> 
>
> etc.
>
> I believe this will solve a number of issues and possibly get the BT and
> MacSword teams onboard to using the same HTML output filters as the
> other projects involve (or at least subclassing them and using the
> majority of their functionality).
>
>
> Now, as to the other issue of using XSLT internally in the engine to
> process OSIS -> HTML
>
> I will throw a few melons into the air for target practice, and let the
> shooting commence.
>
> _
> *Multiple Language*
>
> XSLT is a programming language in the same sense that C++ is a
> programming language.
>
> The SWORD Project C++ engine is written in C++.  It is not a Python
> engine; it is not a Perl engine; it is not a Java engine; it is C++.
>
> One might say, "Well, you can use XSLT from C++.  Doesn't JSword do this
> from Java?"  Well, yes, of course you can, and DM can comment, if he
> feels the desire to recommend his decision to encorporate an XSLT engine
> into the JSword logic flow.  But simply because one CAN doesn't mean one
> SHOULD.  We COULD encorporate a Perl text processing engine in our C++
> code, or an Awk processing engine...  that doesn't mean we SHOULD.  I'm
> sure some would say we SHOULD.  And obviously DM has thought he SHOULD
> encorporate XSLT processing for JSword, so I'm not intending to say it
> is a BAD decision, just that it is not a decision I would

Re: [sword-devel] XSLT vs. C++

2010-12-01 Thread David Haslam

Troy,

"I'll attempt to post a few easy to swallow SWORD 101 classes in email,
which will help us gather our thoughts and documents on how all this works.
"

Why not straight to the  http://crosswire.org/wiki/ wiki ?

David


-- 
View this message in context: 
http://sword-dev.350566.n4.nabble.com/Markup-Options-was-Re-Config-file-for-thml-module-tp3065508p3067336.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] XSLT vs. C++

2010-12-01 Thread Martin Denham
Excuse me for being pure Java and not knowing Sword C++ at all but can I add
(perhaps obviously) that an XSLT framework will perform noticeably slower
than a SAX-like framework.

Here
are
some performance comparisons.  They are old and Java-centric and so XSLT
performance may have improved but these tests show that in the worst case
XSLT was 3 times slower than SAX and a good SAX processor was twice as fast
as a good XSLT processor.  If pages are parsed at the chapter level then
users may notice a delay turning pages on smaller machines like mobile
phones.

Martin

On 1 December 2010 12:20, Troy A. Griffitts  wrote:

> The logic to get from any Publisher Source Document to rendered HTML is
> a very complex task to solve.
>
> We conceptually create Plato's Form of, say, a Bible, and try to fit
> imperfect Publisher markup into this concept.  A Bible has verses,
> headings between verses, chapter intros, footnotes, crossrefs, lemma
> information, etc.
>
> If we do not do this, then we become a PDF reader-- there are already
> PDF readers and we lose the ability to do Bible specific things with our
> software.  For example, if we didn't normalize the concept of crossref
> across all Books, then we couldn't turn them on and off; we couldn't
> provide a crossref panel in the reader which fills according to which
> crossref is hovered over, etc.  Same with notes, strongs, headings, etc.
>
> This causes us to impose our Form onto a publisher's text.  I understand
> why some people may not like this, but it is very much to our end users'
> benefit that we do this.  Without this, we become a web-browser or a PDF
> reader.  Which are fine for their purpose, but we intend to provide
> common, familiar, and sometimes novel Bible study aides to our reader.
>
> The current processing model is dark magic and I apologize for this.  It
> should be well documented and easy to modify.  I will attempt to improve
> the dissemination of knowledge of exactly WHAT our Forms are, how we
> impose those Forms on publishers' texts and improve the documentation
> and code to help others understand and have the ability to improve the
> code.
>
> I'll attempt to post a few easy to swallow SWORD 101 classes in email,
> which will help us gather our thoughts and documents on how all this works.
>
>
> Troy
>
>
>
> On 12/01/2010 12:09 AM, Greg Hellings wrote:
> > On Tue, Nov 30, 2010 at 1:08 PM, Troy A. Griffitts 
> wrote:
> >> Having finally returned from a hectic 2 weeks of conferences, and lots
> >> to do before leaving for Christmas, I'm not sure I'm up for a heated,
> >> passionate debate about technologies right now, but by all means, please
> >> commence the public discussion.
> >>
> >> Let me start by saying that everyone (I believe) agrees that we would
> >> like to have an HTML output from the engine which is more generic and
> >> would allow CSS to be applied if a frontend would like to do this.
> >> Currently HTMLHREF output from the engine is used by the widest number
> >> of frontends (to my knowledge) and would benefit everyone involved by
> >> becoming much more generic. e.g.,
> >>
> >>  -> 
> >> rather than
> >>  -> 
> >>
> >>  -> 
> >> rather than
> >>  -> 
> >>
> >> etc.
> >>
> >> I believe this will solve a number of issues and possibly get the BT and
> >> MacSword teams onboard to using the same HTML output filters as the
> >> other projects involve (or at least subclassing them and using the
> >> majority of their functionality).
> >
> > I think this is our pretty well accepted premise.  The current filters
> > stink to various degrees and currently no one is willing to step up
> > and tackle them.
> >
> >>
> >>
> >> Now, as to the other issue of using XSLT internally in the engine to
> >> process OSIS -> HTML
> >>
> >> I will throw a few melons into the air for target practice, and let the
> >> shooting commence.
> >>
> >> _
> >> *Multiple Language*
> >>
> >> XSLT is a programming language in the same sense that C++ is a
> >> programming language.
> >>
> >> The SWORD Project C++ engine is written in C++.  It is not a Python
> >> engine; it is not a Perl engine; it is not a Java engine; it is C++.
> >>
> >> One might say, "Well, you can use XSLT from C++.  Doesn't JSword do this
> >> from Java?"  Well, yes, of course you can, and DM can comment, if he
> >> feels the desire to recommend his decision to encorporate an XSLT engine
> >> into the JSword logic flow.  But simply because one CAN doesn't mean one
> >> SHOULD.  We COULD encorporate a Perl text processing engine in our C++
> >> code, or an Awk processing engine...  that doesn't mean we SHOULD.  I'm
> >> sure some would say we SHOULD.  And obviously DM has thought he SHOULD
> >> encorporate XSLT processing for JSword, so I'm not intending to say it
> >> is a BAD decision, just that it is not a decision I would make; in the
> >> same way as our projects each chose

Re: [sword-devel] XSLT vs. C++

2010-12-01 Thread Peter von Kaehne

> Von: "Troy A. Griffitts" 
> The current processing model is dark magic and I apologize for this.  It
> should be well documented and easy to modify.  I will attempt to improve
> the dissemination of knowledge of exactly WHAT our Forms are, how we
> impose those Forms on publishers' texts and improve the documentation
> and code to help others understand and have the ability to improve the
> code.
> 
> I'll attempt to post a few easy to swallow SWORD 101 classes in email,
> which will help us gather our thoughts and documents on how all this
> works.

Great! I am looking forward to that.


-- 
Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!  
Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] XSLT vs. C++

2010-12-01 Thread Troy A. Griffitts
The logic to get from any Publisher Source Document to rendered HTML is
a very complex task to solve.

We conceptually create Plato's Form of, say, a Bible, and try to fit
imperfect Publisher markup into this concept.  A Bible has verses,
headings between verses, chapter intros, footnotes, crossrefs, lemma
information, etc.

If we do not do this, then we become a PDF reader-- there are already
PDF readers and we lose the ability to do Bible specific things with our
software.  For example, if we didn't normalize the concept of crossref
across all Books, then we couldn't turn them on and off; we couldn't
provide a crossref panel in the reader which fills according to which
crossref is hovered over, etc.  Same with notes, strongs, headings, etc.

This causes us to impose our Form onto a publisher's text.  I understand
why some people may not like this, but it is very much to our end users'
benefit that we do this.  Without this, we become a web-browser or a PDF
reader.  Which are fine for their purpose, but we intend to provide
common, familiar, and sometimes novel Bible study aides to our reader.

The current processing model is dark magic and I apologize for this.  It
should be well documented and easy to modify.  I will attempt to improve
the dissemination of knowledge of exactly WHAT our Forms are, how we
impose those Forms on publishers' texts and improve the documentation
and code to help others understand and have the ability to improve the code.

I'll attempt to post a few easy to swallow SWORD 101 classes in email,
which will help us gather our thoughts and documents on how all this works.


Troy



On 12/01/2010 12:09 AM, Greg Hellings wrote:
> On Tue, Nov 30, 2010 at 1:08 PM, Troy A. Griffitts  
> wrote:
>> Having finally returned from a hectic 2 weeks of conferences, and lots
>> to do before leaving for Christmas, I'm not sure I'm up for a heated,
>> passionate debate about technologies right now, but by all means, please
>> commence the public discussion.
>>
>> Let me start by saying that everyone (I believe) agrees that we would
>> like to have an HTML output from the engine which is more generic and
>> would allow CSS to be applied if a frontend would like to do this.
>> Currently HTMLHREF output from the engine is used by the widest number
>> of frontends (to my knowledge) and would benefit everyone involved by
>> becoming much more generic. e.g.,
>>
>>  -> 
>> rather than
>>  -> 
>>
>>  -> 
>> rather than
>>  -> 
>>
>> etc.
>>
>> I believe this will solve a number of issues and possibly get the BT and
>> MacSword teams onboard to using the same HTML output filters as the
>> other projects involve (or at least subclassing them and using the
>> majority of their functionality).
> 
> I think this is our pretty well accepted premise.  The current filters
> stink to various degrees and currently no one is willing to step up
> and tackle them.
> 
>>
>>
>> Now, as to the other issue of using XSLT internally in the engine to
>> process OSIS -> HTML
>>
>> I will throw a few melons into the air for target practice, and let the
>> shooting commence.
>>
>> _
>> *Multiple Language*
>>
>> XSLT is a programming language in the same sense that C++ is a
>> programming language.
>>
>> The SWORD Project C++ engine is written in C++.  It is not a Python
>> engine; it is not a Perl engine; it is not a Java engine; it is C++.
>>
>> One might say, "Well, you can use XSLT from C++.  Doesn't JSword do this
>> from Java?"  Well, yes, of course you can, and DM can comment, if he
>> feels the desire to recommend his decision to encorporate an XSLT engine
>> into the JSword logic flow.  But simply because one CAN doesn't mean one
>> SHOULD.  We COULD encorporate a Perl text processing engine in our C++
>> code, or an Awk processing engine...  that doesn't mean we SHOULD.  I'm
>> sure some would say we SHOULD.  And obviously DM has thought he SHOULD
>> encorporate XSLT processing for JSword, so I'm not intending to say it
>> is a BAD decision, just that it is not a decision I would make; in the
>> same way as our projects each chose C++ vs. Java to implement our objective.
> 
> If a developer is going to develop OSIS -> HTML filters, for instance,
> we are already assuming they know OSIS and HTML.  OSIS is XML and HTML
> is SGML (though most of our work is probably targetting a more
> XML-dialect of HTML).  XSLT is also XML.  Formally, it is not even a
> programming language, but just a set of formatting/processing
> instructions in XML.
> 
> Any developer using XML who is worth their salt should at least be
> familiar with the basics of XSL - they may not be a guru of XPath
> expressions or have every attribute of XSL memorized - and would
> probably expect a library which handles XML as its preferred input
> method to utilize one of the standard XML processing methods.  I know
> I'm not the only person who was surprised to look in the library
> filters and see neither DOM, SAX nor XSLT techno

Re: [sword-devel] XSLT vs. C++

2010-11-30 Thread Greg Hellings
I also forgot to mention the following:

If we want to encourage people to use XML because it can validate and
detect errors, etc, why don't we use XML utilities with that
functionality ourselves?

And, it's not like this work hasn't already been partially tackled
with JSword's library from which we could take hints and its XSL as
well.

--Greg

On Tue, Nov 30, 2010 at 6:09 PM, Greg Hellings  wrote:
> On Tue, Nov 30, 2010 at 1:08 PM, Troy A. Griffitts  
> wrote:



___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] XSLT vs. C++

2010-11-30 Thread Greg Hellings
On Tue, Nov 30, 2010 at 1:08 PM, Troy A. Griffitts  wrote:
> Having finally returned from a hectic 2 weeks of conferences, and lots
> to do before leaving for Christmas, I'm not sure I'm up for a heated,
> passionate debate about technologies right now, but by all means, please
> commence the public discussion.
>
> Let me start by saying that everyone (I believe) agrees that we would
> like to have an HTML output from the engine which is more generic and
> would allow CSS to be applied if a frontend would like to do this.
> Currently HTMLHREF output from the engine is used by the widest number
> of frontends (to my knowledge) and would benefit everyone involved by
> becoming much more generic. e.g.,
>
>  -> 
> rather than
>  -> 
>
>  -> 
> rather than
>  -> 
>
> etc.
>
> I believe this will solve a number of issues and possibly get the BT and
> MacSword teams onboard to using the same HTML output filters as the
> other projects involve (or at least subclassing them and using the
> majority of their functionality).

I think this is our pretty well accepted premise.  The current filters
stink to various degrees and currently no one is willing to step up
and tackle them.

>
>
> Now, as to the other issue of using XSLT internally in the engine to
> process OSIS -> HTML
>
> I will throw a few melons into the air for target practice, and let the
> shooting commence.
>
> _
> *Multiple Language*
>
> XSLT is a programming language in the same sense that C++ is a
> programming language.
>
> The SWORD Project C++ engine is written in C++.  It is not a Python
> engine; it is not a Perl engine; it is not a Java engine; it is C++.
>
> One might say, "Well, you can use XSLT from C++.  Doesn't JSword do this
> from Java?"  Well, yes, of course you can, and DM can comment, if he
> feels the desire to recommend his decision to encorporate an XSLT engine
> into the JSword logic flow.  But simply because one CAN doesn't mean one
> SHOULD.  We COULD encorporate a Perl text processing engine in our C++
> code, or an Awk processing engine...  that doesn't mean we SHOULD.  I'm
> sure some would say we SHOULD.  And obviously DM has thought he SHOULD
> encorporate XSLT processing for JSword, so I'm not intending to say it
> is a BAD decision, just that it is not a decision I would make; in the
> same way as our projects each chose C++ vs. Java to implement our objective.

If a developer is going to develop OSIS -> HTML filters, for instance,
we are already assuming they know OSIS and HTML.  OSIS is XML and HTML
is SGML (though most of our work is probably targetting a more
XML-dialect of HTML).  XSLT is also XML.  Formally, it is not even a
programming language, but just a set of formatting/processing
instructions in XML.

Any developer using XML who is worth their salt should at least be
familiar with the basics of XSL - they may not be a guru of XPath
expressions or have every attribute of XSL memorized - and would
probably expect a library which handles XML as its preferred input
method to utilize one of the standard XML processing methods.  I know
I'm not the only person who was surprised to look in the library
filters and see neither DOM, SAX nor XSLT technologies in use.  That
was when I first ran and hid.

Of course, this portion of the discussion is only relevant for the
from-OSIS filters.

>
> ___
> *XSLT better than C++*
>
> One might say, "well, XSLT is better suited to process XML than C++."
> That's a loaded and unquantified statement.
>
> Certainly the C++ language specification doesn't include facilities to
> easily process XML, but that doesn't mean a plethora of C++ libraries
> don't exists for assisting in this task.
>
> The SWORD engine includes classes like XMLTag and SWBasicFilter which
> implement a SAX processing model.
>
> The current filters do not all use SWBasicFilter, nor XMLTag.  They've
> been written over 15 years and many before these classes existed.  Some
> are ugly and need to be rewritten for readability, certainly.  But not
> necessarily in a different programming language.

XSLT being "better" is, yes, a matter of complete subjectivity.  And,
as I mentioned above, is only useful when our source is XML to begin
with.  For GBF or Plaintext sources, XSLT is clearly not even
applicable.

But the current C++ is so good that you seem the only person willing
to touch it.  Peter just mentioned he tried once and couldn't get it.
I have gone into the filters before with a singular goal in mind and
was able to produce my desired changes, but it was long, drawn-out and
painful.  Doing the same tasks in XSL would have taken me mere
seconds.  I know a few other people, at least, have said they would
know how to do a task if XSLT was used instead of C++.  Of course,
that is a hypothetical - I can't know that they would have done so,
but that was their claim at the time.

Our recent discussion about the use of the "n" attribute for footnotes
in ThML is a perfect 

Re: [sword-devel] XSLT vs. C++

2010-11-30 Thread Peter von Kaehne
Thanks Troy,

Probably I am the worst person to answer first, but it was me who threw
the matter once again into the ring. Hence...

On 30/11/10 19:08, Troy A. Griffitts wrote:
> Having finally returned from a hectic 2 weeks of conferences, and lots
> to do before leaving for Christmas, I'm not sure I'm up for a heated,
> passionate debate about technologies right now, but by all means, please
> commence the public discussion.
>
> Let me start by saying that everyone (I believe) agrees that we would
> like to have an HTML output from the engine which is more generic and
> would allow CSS to be applied if a frontend would like to do this.
> Currently HTMLHREF output from the engine is used by the widest number
> of frontends (to my knowledge) and would benefit everyone involved by
> becoming much more generic. e.g.,
>
>  -> 
> rather than
>  -> 
>
>  -> 
> rather than
>  -> 
>
> etc.
>
> I believe this will solve a number of issues and possibly get the BT and
> MacSword teams onboard to using the same HTML output filters as the
> other projects involve (or at least subclassing them and using the
> majority of their functionality).
>
A more detailed and accurate feedback from the engine for what is
actually in the source text would certainly be for me a huge step forward.

The lack of detail and fine print in the current filtering is doing
neither the modules nor the vast majority of presentation engines we use
any justice. I guess this is a universal agreement. Essentially we make
no use of most OSIS attributes, but mostly simply translate the tags
themselves. So much so that some are in some frontends actually
re-invented - footnote markers are a case in point.

Moving beyond that agreement, the question is how to proceed.

The single most important reason I have to prefer XSLT over C++ in the
filters is admittedly not technical:

I believe there is a wider range of people within CrossWire who can make
sense of XSLT and could engage with the filters, than with the current
layout in C++.  I think filters are - once we are beyond the basic
implementation - of particular interest to module makers.  I could see a
work flow emerging for module makers which looks like this:

- Create a valid OSIS document with all features which are in the text,
- determine which OSIS features are not yet covered,
- fix/expand the filters.

I have no doubt that this is not sufficient reason and I would be
delighted to see if a similar work flow could emerge with C++, but you
will admit, it has not so far, apart from Chris. I had a good go at
trying to understand osis2htmlref.cpp - and gave up. I will happily try
again, but am not too hopeful. I think the age of the filters is
certainly witness to the fact that not many dared touching them.

The second reason I have is related - an updated style sheet is easily
distributed and would allow a separate and increased frequency of
release. It could even become part of the module manager's refresh
mechanism - check on updates to XSLT sheet and call them in on refresh.
We could then leave actual libsword releases for bugs and fundamental
changes, but have the whole filter release much more dynamic and speedy.

A separated out C++ filter bundle might of course do the same, though I
am not sure how this would pan out technically.

Yours

Peter



___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


[sword-devel] XSLT vs. C++

2010-11-30 Thread Troy A. Griffitts
Having finally returned from a hectic 2 weeks of conferences, and lots
to do before leaving for Christmas, I'm not sure I'm up for a heated,
passionate debate about technologies right now, but by all means, please
commence the public discussion.

Let me start by saying that everyone (I believe) agrees that we would
like to have an HTML output from the engine which is more generic and
would allow CSS to be applied if a frontend would like to do this.
Currently HTMLHREF output from the engine is used by the widest number
of frontends (to my knowledge) and would benefit everyone involved by
becoming much more generic. e.g.,

 -> 
rather than
 -> 

 -> 
rather than
 -> 

etc.

I believe this will solve a number of issues and possibly get the BT and
MacSword teams onboard to using the same HTML output filters as the
other projects involve (or at least subclassing them and using the
majority of their functionality).


Now, as to the other issue of using XSLT internally in the engine to
process OSIS -> HTML

I will throw a few melons into the air for target practice, and let the
shooting commence.

_
*Multiple Language*

XSLT is a programming language in the same sense that C++ is a
programming language.

The SWORD Project C++ engine is written in C++.  It is not a Python
engine; it is not a Perl engine; it is not a Java engine; it is C++.

One might say, "Well, you can use XSLT from C++.  Doesn't JSword do this
from Java?"  Well, yes, of course you can, and DM can comment, if he
feels the desire to recommend his decision to encorporate an XSLT engine
into the JSword logic flow.  But simply because one CAN doesn't mean one
SHOULD.  We COULD encorporate a Perl text processing engine in our C++
code, or an Awk processing engine...  that doesn't mean we SHOULD.  I'm
sure some would say we SHOULD.  And obviously DM has thought he SHOULD
encorporate XSLT processing for JSword, so I'm not intending to say it
is a BAD decision, just that it is not a decision I would make; in the
same way as our projects each chose C++ vs. Java to implement our objective.

___
*XSLT better than C++*

One might say, "well, XSLT is better suited to process XML than C++."
That's a loaded and unquantified statement.

Certainly the C++ language specification doesn't include facilities to
easily process XML, but that doesn't mean a plethora of C++ libraries
don't exists for assisting in this task.

The SWORD engine includes classes like XMLTag and SWBasicFilter which
implement a SAX processing model.

The current filters do not all use SWBasicFilter, nor XMLTag.  They've
been written over 15 years and many before these classes existed.  Some
are ugly and need to be rewritten for readability, certainly.  But not
necessarily in a different programming language.


*COMPLEXITY*

The task of enumerating all types of OSIS  tags, and deciding
what to do with each, and how to classify all  tags from all
possible OSIS documents into our enumeration is still going to be a
complex task using XSLT.   is a complex example, but certainly
not the most complex.

It is a tall task to generalize all elements of all documents from all
publishers into one conceptual model with one chosen output for a
frontend-- whether that be for an audience on the Desktop, web-based, or
a handheld.

The complex processing required by the engine will require long, complex
XSLT-- which likely will encorporate callbacks to C++.  It will not be
more simple-- only mixed language.
___
*Semantic vs. Display*

Some will say (and have), "well, let everything be display oriented and
let the publisher decide".  Fine, then you lose 2 things: the ability to
display differently per user preference, per display device; and you
also give up the promise to actually do any interesting research on the
text.  When you lose semantic markup, then you lose all interesting
information about WHAT is being marked up.

___
*More than a Rending Engine*

The SWORD C++ Engine is more than simply a text rendering engine-- it is
a Biblical text research engine.

If I'd like to know the morphology of word 3 in 2Thes 2.13 of the WHNU
Greek text, the entire program to do such is:

SWMgr library;
SWModule *whnu = library.getModule("WHNU");
whnu->setKey("2th.2.13");
whnu->RenderText();

cout << "The morphology of word three is: " <<
whnu->getEntryAttributes()["Word"]["003"]["Morph"] << endl;


That reads nice (at least in my opinion).  I don't need to know about
XML, XSLT, care what markup the WHNU module uses, I don't even have to
know how to make a SWORD filter.  The current filters do all the work of
breaking out these attributes and making them available in a nice and
interesting map.

__


And finally, if bullets aren't flying already, I'll stir the heat up with...

XSLT sucks.  A good C++ programmer can do anything in C++ better than
any XSLT programmer.


:)

*duck*
Have fun.