Re: [sword-devel] Normalization?

2011-08-31 Thread David Haslam
Thanks for detailed comments on rendering.

Are there any implications for the search feature of SWORD/JSword when using
combining characters?

David

--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780433.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] Strong's Numbers assignment

2011-08-31 Thread DM Smith

On Aug 31, 2011, at 5:37 AM, Konstantin Maslyuk wrote:

 Hi, all.
 
 Is  there  any  king  of  guidelines  or  manual  on  Strong's numbers
 assignment on text?

In OSIS they should surround the text to which they pertain. In other markup, 
they are placed after the word or phrase.

 
 Or can someone just tell me what to do with strong numbers that are in
 original  text but was omitted in target text. Can i also omit them or
 i  should  add  those strongs on any most appropriate word, if omitted
 word is sentence beginning/ending can i also just put strong number on
 sentence beginning/ending not taking into account to word meaning?

You can omit them, but then a strong's number search won't find the verse.

Or put them at the end. It'd be good if front ends did not display them in 
verses when they weren't associated with a word. But since they don't, I 
wouldn't put them at the beginning. But add them in the order that they 
occurred in the original Greek.

But is is best to mark the proper word.

In Him,
DM

 
 Blessings.
 
 
 ___
 sword-devel mailing list: sword-devel@crosswire.org
 http://www.crosswire.org/mailman/listinfo/sword-devel
 Instructions to unsubscribe/change your settings at above page


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] Normalization?

2011-08-31 Thread DM Smith

On Aug 31, 2011, at 4:01 AM, David Haslam wrote:

 Thanks for detailed comments on rendering.
 
 Are there any implications for the search feature of SWORD/JSword when using
 combining characters?

The simple rule is that if a search request and the indexed text are not 
normalized the same, there will not be a hit.

Today, our frontends do not normalize the text into a particular normalization 
form when building the search index. Ditto for the search request. They leave 
it up to the module builder and the end user to agree by accident, which works 
really well for English. But fails miserably with decorated characters.

It'd be best for SWORD/JSword to do ICU normalization to a known form for 
search. Note, that it could be to NFKD and then stripped of decorations. Since 
it would be an internal form it doesn't matter that it would look ugly to the 
end user.

Regarding rendering, each frontend should not assume that the module is encoded 
in a way that works for it. When we did experiments, NFC was the best across 
the widest variety of frontends. But no one way was best for every script, font 
or display engine. It'd be best for each frontend to normalize the text before 
display. This probably would be different than the normalization for search.

In Him,
DM


 
 David
 
 --
 View this message in context: 
 http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780433.html
 Sent from the SWORD Dev mailing list archive at Nabble.com.
 
 ___
 sword-devel mailing list: sword-devel@crosswire.org
 http://www.crosswire.org/mailman/listinfo/sword-devel
 Instructions to unsubscribe/change your settings at above page


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] Strong's Numbers assignment

2011-08-31 Thread Konstantin Maslyuk
 Or can someone just tell me what to do with strong numbers that are in
 original  text but was omitted in target text. Can i also omit them or
 i  should  add  those strongs on any most appropriate word, if omitted
 word is sentence beginning/ending can i also just put strong number on
 sentence beginning/ending not taking into account to word meaning?

 You can omit them, but then a strong's number search won't find the verse.

 Or put them at the end. It'd be good if front ends did not display
 them in verses when they weren't associated with a word. But since
 they don't, I wouldn't put them at the beginning. But add them in
 the order that they occurred in the original Greek.

Thank  you  this  is  helpful.  What  about  adding omitted strongs to
nearest  word  in  destination  text,  so  user can view those strongs
numbers?

 But is is best to mark the proper word.


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] Normalization?

2011-08-31 Thread David Haslam
Thanks DM.

The responses in this thread are really informative. Could we post them
somewhere in the wiki, please?

David

--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780893.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] Normalization?

2011-08-31 Thread David Haslam
Done.  

See  http://crosswire.org/wiki/Encoding#Normalization
http://crosswire.org/wiki/Encoding#Normalization 

David

--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780930.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] Normalization?

2011-08-31 Thread Troy A. Griffitts
Quickly before posting, this data is not entirely accurate.

I've posted this a number of times and hope frontends have taken this to
heart.

SWORD has the concept of preparing a text for searching.
Modules can add StripFilters to do whatever preparation they want to do
for searching.
SWModule makes this processing available for not just the module text,
but also for any buffer that might want to be prepared exactly the same
way (SWModule::StripText)
It is highly recommended that frontend developers use this method on the
user inputted search term.

http://www.crosswire.org/pipermail/mobile-devel/2010-May/000121.html


On 31/08/11 05:55, David Haslam wrote:
 Thanks DM.
 
 The responses in this thread are really informative. Could we post them
 somewhere in the wiki, please?
 
 David
 
 --
 View this message in context: 
 http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780893.html
 Sent from the SWORD Dev mailing list archive at Nabble.com.
 
 ___
 sword-devel mailing list: sword-devel@crosswire.org
 http://www.crosswire.org/mailman/listinfo/sword-devel
 Instructions to unsubscribe/change your settings at above page

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] Normalization?

2011-08-31 Thread DM Smith
Troy,
User's typically input decomposed text for a search request. The module 
is typically composed text. When creating a lucene index is the text decomposed 
and then stripped? (I don't remember seeing that in the code.)

DM

On Aug 31, 2011, at 9:21 AM, Troy A. Griffitts wrote:

 Quickly before posting, this data is not entirely accurate.
 
 I've posted this a number of times and hope frontends have taken this to
 heart.
 
 SWORD has the concept of preparing a text for searching.
 Modules can add StripFilters to do whatever preparation they want to do
 for searching.
 SWModule makes this processing available for not just the module text,
 but also for any buffer that might want to be prepared exactly the same
 way (SWModule::StripText)
 It is highly recommended that frontend developers use this method on the
 user inputted search term.
 
 http://www.crosswire.org/pipermail/mobile-devel/2010-May/000121.html
 
 
 On 31/08/11 05:55, David Haslam wrote:
 Thanks DM.
 
 The responses in this thread are really informative. Could we post them
 somewhere in the wiki, please?
 
 David
 
 --
 View this message in context: 
 http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780893.html
 Sent from the SWORD Dev mailing list archive at Nabble.com.
 
 ___
 sword-devel mailing list: sword-devel@crosswire.org
 http://www.crosswire.org/mailman/listinfo/sword-devel
 Instructions to unsubscribe/change your settings at above page
 
 ___
 sword-devel mailing list: sword-devel@crosswire.org
 http://www.crosswire.org/mailman/listinfo/sword-devel
 Instructions to unsubscribe/change your settings at above page


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] Strong's Numbers assignment

2011-08-31 Thread DM Smith

On 08/31/2011 08:30 AM, Konstantin Maslyuk wrote:

Or can someone just tell me what to do with strong numbers that are in
original  text but was omitted in target text. Can i also omit them or
i  should  add  those strongs on any most appropriate word, if omitted
word is sentence beginning/ending can i also just put strong number on
sentence beginning/ending not taking into account to word meaning?

You can omit them, but then a strong's number search won't find the verse.
Or put them at the end. It'd be good if front ends did not display
them in verses when they weren't associated with a word. But since
they don't, I wouldn't put them at the beginning. But add them in
the order that they occurred in the original Greek.

Thank  you  this  is  helpful.  What  about  adding omitted strongs to
nearest  word  in  destination  text,  so  user can view those strongs
numbers?
That's what I meant by adding them in the original Greek order. I'm not 
sure whether adding them to the prior word/phrase or having them be 
empty would be better.



But is is best to mark the proper word.


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page



___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] Normalization?

2011-08-31 Thread Troy A. Griffitts


On 31/08/11 07:47, DM Smith wrote:
 Troy, User's typically input decomposed text for a search request.
 The module is typically composed text. When creating a lucene index
 is the text decomposed and then stripped? (I don't remember seeing
 that in the code.)

Yes the strip filters are run during lucene index creation.  If the
module has a decomposition strip filter added, then it will be run.
This is the designed way to handled the issue.

For Greek, Hebrew, and Arabic we have special logic to strip accents and
pointing.
http://crosswire.org/svn/sword/trunk/src/modules/swmodule.cpp
(see ccent)
This is not ideal and should be moved to strip filter logic.

The example given in the thread I referenced in my last email, and which
is probably tiresome because I keep posting it is:

A search using unaccented search term (μακαρ) over Greek inscriptions
containing critical annotation:

http://crosswire.org/study/wordsearchresults.jsp?searchTerm=%CE%BC%CE%B1%CE%BA%CE%B1%CF%81mod=PHI_CHR

Notice the search string: μακαρ,
and the matches:

μακάρ
μ[ακαρ]
Μακαρ
μακαρ
μ]ακαρ
μακα[ρ]

etc.

Also, the search term: Μάκαρ,
yields the same 33 hits:

http://crosswire.org/study/wordsearchresults.jsp?searchTerm=%CE%9C%E1%BD%B1%CE%BA%CE%B1%CF%81

If anything, this is a module configuration issue and a frontend policy
issue-- if they do not all use the suggestion to process user search
input before sending to the engine.

I have considered forcing this logic by placing it into the search
method itself, but I worry if it might take away the option of some
searches.  I've leaned toward making it a recommended policy for
frontends for now.

Troy



 
 DM
 
 On Aug 31, 2011, at 9:21 AM, Troy A. Griffitts wrote:
 
 Quickly before posting, this data is not entirely accurate.
 
 I've posted this a number of times and hope frontends have taken
 this to heart.
 
 SWORD has the concept of preparing a text for searching. Modules
 can add StripFilters to do whatever preparation they want to do for
 searching. SWModule makes this processing available for not just
 the module text, but also for any buffer that might want to be
 prepared exactly the same way (SWModule::StripText) It is highly
 recommended that frontend developers use this method on the user
 inputted search term.
 
 http://www.crosswire.org/pipermail/mobile-devel/2010-May/000121.html



 
On 31/08/11 05:55, David Haslam wrote:
 Thanks DM.
 
 The responses in this thread are really informative. Could we
 post them somewhere in the wiki, please?
 
 David
 
 -- View this message in context:
 http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780893.html

 
Sent from the SWORD Dev mailing list archive at Nabble.com.
 
 ___ sword-devel
 mailing list: sword-devel@crosswire.org 
 http://www.crosswire.org/mailman/listinfo/sword-devel 
 Instructions to unsubscribe/change your settings at above page
 
 ___ sword-devel mailing
 list: sword-devel@crosswire.org 
 http://www.crosswire.org/mailman/listinfo/sword-devel Instructions
 to unsubscribe/change your settings at above page
 
 
 ___ sword-devel mailing
 list: sword-devel@crosswire.org 
 http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to
 unsubscribe/change your settings at above page

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Normalization?

2011-08-31 Thread David Haslam
As I'd already posted to the wiki page before Troy joined the thread, please
feel free to make suitable corrections to the section I added. 

The tech details are getting a tad beyond my comprehension.

http://crosswire.org/wiki/Encoding

David

--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3782080.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] The SIL Pathway project

2011-08-31 Thread Greg Hellings
This afternoon I spoke with my project leader at Wycliffe and I asked
the team about this project. They were, of course, familiar with it,
since he is the manager for the general Wycliffe-on-Linux work and
FieldWorks is one of those tasks.

Apparently they were unaware that GoBible is a CrossWire project and
seemed genuinely surprised that they were already working with
CrossWire formats in their export.  They would, however, be happy to
receive help in the task of implementing a SWORD exporter. Both
FieldWorks - the application used in minority language research and
cultural investigation - and the Pathway plugin are open source and
leverage mainly C#.

When I looked, it appears the GoBible exporter mainly dumps into HTML
format?  If so, then writing a SWORD exporter should be relatively
straightforward if you wanted to run through imp+ThML formatting.  If
anyone has a knowledge of C#, please feel free to contact the admin of
the project who by now should have been made aware of the nearness of
SWORD as an export format. They were also interested if SWORD format
export would gain access to mobile platforms, which I assured him it
would, so if someone takes this up you might want to mention that
exporting to SWORD proper would allow the material to be used on the
richer mobile devices through PocketSword and AndBible in complement
to the GoBible on JavaME devices.

I will also be contacting the PM for Pathway tomorrow or Friday to
introduce myself and explain that SWORD format for his project would
work very well in concert with the work I'm already doing with
Wycliffe to bring large numbers of their works into SWORD.  I highly
encourage someone else to take up this mantle, as I have no experience
with C# and am already committed to another project using SWORD within
Wycliffe.  Even if you can't dedicate any time to the implementation
of the export filter, if they have a subject matter expert from the
SWORD side, the Wycliffe teams are very highly motivated and can gain
a major boost to their work velocity from regular interactions with
CrossWire people.

--Greg

On Tue, Aug 30, 2011 at 8:06 AM, David Haslam dfh...@googlemail.com wrote:
 The *SIL Pathway* project now has its own web page.

 http://pathway.sil.org/ http://pathway.sil.org/

 The table of output options includes Go Bible, but does not include SWORD.

 Although I'm glad about the inclusion of Go Bible, I'm sad that SWORD is not
 up there with all the rest.

 Those of you who have good personal contacts within SIL/Wycliffe - please
 see what you can do and say to rectify this omission.

 Some humble and gentle persuasion might be the order of the day.

 David



 --
 View this message in context: 
 http://sword-dev.350566.n4.nabble.com/The-SIL-Pathway-project-tp3560313p3778663.html
 Sent from the SWORD Dev mailing list archive at Nabble.com.

 ___
 sword-devel mailing list: sword-devel@crosswire.org
 http://www.crosswire.org/mailman/listinfo/sword-devel
 Instructions to unsubscribe/change your settings at above page


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page