Re: [sword-devel] Normalization?
Thanks for detailed comments on rendering. Are there any implications for the search feature of SWORD/JSword when using combining characters? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780433.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Strong's Numbers assignment
On Aug 31, 2011, at 5:37 AM, Konstantin Maslyuk wrote: Hi, all. Is there any king of guidelines or manual on Strong's numbers assignment on text? In OSIS they should surround the text to which they pertain. In other markup, they are placed after the word or phrase. Or can someone just tell me what to do with strong numbers that are in original text but was omitted in target text. Can i also omit them or i should add those strongs on any most appropriate word, if omitted word is sentence beginning/ending can i also just put strong number on sentence beginning/ending not taking into account to word meaning? You can omit them, but then a strong's number search won't find the verse. Or put them at the end. It'd be good if front ends did not display them in verses when they weren't associated with a word. But since they don't, I wouldn't put them at the beginning. But add them in the order that they occurred in the original Greek. But is is best to mark the proper word. In Him, DM Blessings. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Normalization?
On Aug 31, 2011, at 4:01 AM, David Haslam wrote: Thanks for detailed comments on rendering. Are there any implications for the search feature of SWORD/JSword when using combining characters? The simple rule is that if a search request and the indexed text are not normalized the same, there will not be a hit. Today, our frontends do not normalize the text into a particular normalization form when building the search index. Ditto for the search request. They leave it up to the module builder and the end user to agree by accident, which works really well for English. But fails miserably with decorated characters. It'd be best for SWORD/JSword to do ICU normalization to a known form for search. Note, that it could be to NFKD and then stripped of decorations. Since it would be an internal form it doesn't matter that it would look ugly to the end user. Regarding rendering, each frontend should not assume that the module is encoded in a way that works for it. When we did experiments, NFC was the best across the widest variety of frontends. But no one way was best for every script, font or display engine. It'd be best for each frontend to normalize the text before display. This probably would be different than the normalization for search. In Him, DM David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780433.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Strong's Numbers assignment
Or can someone just tell me what to do with strong numbers that are in original text but was omitted in target text. Can i also omit them or i should add those strongs on any most appropriate word, if omitted word is sentence beginning/ending can i also just put strong number on sentence beginning/ending not taking into account to word meaning? You can omit them, but then a strong's number search won't find the verse. Or put them at the end. It'd be good if front ends did not display them in verses when they weren't associated with a word. But since they don't, I wouldn't put them at the beginning. But add them in the order that they occurred in the original Greek. Thank you this is helpful. What about adding omitted strongs to nearest word in destination text, so user can view those strongs numbers? But is is best to mark the proper word. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Normalization?
Thanks DM. The responses in this thread are really informative. Could we post them somewhere in the wiki, please? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780893.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Normalization?
Done. See http://crosswire.org/wiki/Encoding#Normalization http://crosswire.org/wiki/Encoding#Normalization David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780930.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Normalization?
Quickly before posting, this data is not entirely accurate. I've posted this a number of times and hope frontends have taken this to heart. SWORD has the concept of preparing a text for searching. Modules can add StripFilters to do whatever preparation they want to do for searching. SWModule makes this processing available for not just the module text, but also for any buffer that might want to be prepared exactly the same way (SWModule::StripText) It is highly recommended that frontend developers use this method on the user inputted search term. http://www.crosswire.org/pipermail/mobile-devel/2010-May/000121.html On 31/08/11 05:55, David Haslam wrote: Thanks DM. The responses in this thread are really informative. Could we post them somewhere in the wiki, please? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780893.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Normalization?
Troy, User's typically input decomposed text for a search request. The module is typically composed text. When creating a lucene index is the text decomposed and then stripped? (I don't remember seeing that in the code.) DM On Aug 31, 2011, at 9:21 AM, Troy A. Griffitts wrote: Quickly before posting, this data is not entirely accurate. I've posted this a number of times and hope frontends have taken this to heart. SWORD has the concept of preparing a text for searching. Modules can add StripFilters to do whatever preparation they want to do for searching. SWModule makes this processing available for not just the module text, but also for any buffer that might want to be prepared exactly the same way (SWModule::StripText) It is highly recommended that frontend developers use this method on the user inputted search term. http://www.crosswire.org/pipermail/mobile-devel/2010-May/000121.html On 31/08/11 05:55, David Haslam wrote: Thanks DM. The responses in this thread are really informative. Could we post them somewhere in the wiki, please? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780893.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Strong's Numbers assignment
On 08/31/2011 08:30 AM, Konstantin Maslyuk wrote: Or can someone just tell me what to do with strong numbers that are in original text but was omitted in target text. Can i also omit them or i should add those strongs on any most appropriate word, if omitted word is sentence beginning/ending can i also just put strong number on sentence beginning/ending not taking into account to word meaning? You can omit them, but then a strong's number search won't find the verse. Or put them at the end. It'd be good if front ends did not display them in verses when they weren't associated with a word. But since they don't, I wouldn't put them at the beginning. But add them in the order that they occurred in the original Greek. Thank you this is helpful. What about adding omitted strongs to nearest word in destination text, so user can view those strongs numbers? That's what I meant by adding them in the original Greek order. I'm not sure whether adding them to the prior word/phrase or having them be empty would be better. But is is best to mark the proper word. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Normalization?
On 31/08/11 07:47, DM Smith wrote: Troy, User's typically input decomposed text for a search request. The module is typically composed text. When creating a lucene index is the text decomposed and then stripped? (I don't remember seeing that in the code.) Yes the strip filters are run during lucene index creation. If the module has a decomposition strip filter added, then it will be run. This is the designed way to handled the issue. For Greek, Hebrew, and Arabic we have special logic to strip accents and pointing. http://crosswire.org/svn/sword/trunk/src/modules/swmodule.cpp (see ccent) This is not ideal and should be moved to strip filter logic. The example given in the thread I referenced in my last email, and which is probably tiresome because I keep posting it is: A search using unaccented search term (μακαρ) over Greek inscriptions containing critical annotation: http://crosswire.org/study/wordsearchresults.jsp?searchTerm=%CE%BC%CE%B1%CE%BA%CE%B1%CF%81mod=PHI_CHR Notice the search string: μακαρ, and the matches: μακάρ μ[ακαρ] Μακαρ μακαρ μ]ακαρ μακα[ρ] etc. Also, the search term: Μάκαρ, yields the same 33 hits: http://crosswire.org/study/wordsearchresults.jsp?searchTerm=%CE%9C%E1%BD%B1%CE%BA%CE%B1%CF%81 If anything, this is a module configuration issue and a frontend policy issue-- if they do not all use the suggestion to process user search input before sending to the engine. I have considered forcing this logic by placing it into the search method itself, but I worry if it might take away the option of some searches. I've leaned toward making it a recommended policy for frontends for now. Troy DM On Aug 31, 2011, at 9:21 AM, Troy A. Griffitts wrote: Quickly before posting, this data is not entirely accurate. I've posted this a number of times and hope frontends have taken this to heart. SWORD has the concept of preparing a text for searching. Modules can add StripFilters to do whatever preparation they want to do for searching. SWModule makes this processing available for not just the module text, but also for any buffer that might want to be prepared exactly the same way (SWModule::StripText) It is highly recommended that frontend developers use this method on the user inputted search term. http://www.crosswire.org/pipermail/mobile-devel/2010-May/000121.html On 31/08/11 05:55, David Haslam wrote: Thanks DM. The responses in this thread are really informative. Could we post them somewhere in the wiki, please? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3780893.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] Normalization?
As I'd already posted to the wiki page before Troy joined the thread, please feel free to make suitable corrections to the section I added. The tech details are getting a tad beyond my comprehension. http://crosswire.org/wiki/Encoding David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Normalization-tp3779484p3782080.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] The SIL Pathway project
This afternoon I spoke with my project leader at Wycliffe and I asked the team about this project. They were, of course, familiar with it, since he is the manager for the general Wycliffe-on-Linux work and FieldWorks is one of those tasks. Apparently they were unaware that GoBible is a CrossWire project and seemed genuinely surprised that they were already working with CrossWire formats in their export. They would, however, be happy to receive help in the task of implementing a SWORD exporter. Both FieldWorks - the application used in minority language research and cultural investigation - and the Pathway plugin are open source and leverage mainly C#. When I looked, it appears the GoBible exporter mainly dumps into HTML format? If so, then writing a SWORD exporter should be relatively straightforward if you wanted to run through imp+ThML formatting. If anyone has a knowledge of C#, please feel free to contact the admin of the project who by now should have been made aware of the nearness of SWORD as an export format. They were also interested if SWORD format export would gain access to mobile platforms, which I assured him it would, so if someone takes this up you might want to mention that exporting to SWORD proper would allow the material to be used on the richer mobile devices through PocketSword and AndBible in complement to the GoBible on JavaME devices. I will also be contacting the PM for Pathway tomorrow or Friday to introduce myself and explain that SWORD format for his project would work very well in concert with the work I'm already doing with Wycliffe to bring large numbers of their works into SWORD. I highly encourage someone else to take up this mantle, as I have no experience with C# and am already committed to another project using SWORD within Wycliffe. Even if you can't dedicate any time to the implementation of the export filter, if they have a subject matter expert from the SWORD side, the Wycliffe teams are very highly motivated and can gain a major boost to their work velocity from regular interactions with CrossWire people. --Greg On Tue, Aug 30, 2011 at 8:06 AM, David Haslam dfh...@googlemail.com wrote: The *SIL Pathway* project now has its own web page. http://pathway.sil.org/ http://pathway.sil.org/ The table of output options includes Go Bible, but does not include SWORD. Although I'm glad about the inclusion of Go Bible, I'm sad that SWORD is not up there with all the rest. Those of you who have good personal contacts within SIL/Wycliffe - please see what you can do and say to rectify this omission. Some humble and gentle persuasion might be the order of the day. David -- View this message in context: http://sword-dev.350566.n4.nabble.com/The-SIL-Pathway-project-tp3560313p3778663.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page