Re: [sword-devel] testing for diacritics
Peter, Are you also contemplating a new configuration item? e.g. GlobalOptionFilter=UTF8ArabicHarraket Might this be a useful enhancement to module AraNAV ? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/testing-for-diacritics-tp4655091p4655188.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
On 2015-09-02, 14:43 GMT, David Haslam wrote: > For an online utility see http://www.harakat.ae/ > > فِي الْبَدْءِ خَلَقَ اللهُ السَّمَاوَاتِ وَالأَرْضَ، > becomes > في البدء خلق الله السماوات والأرض، With a bit of web-scrapping, one could make a library using it as a webservice, couldn't we? Matěj -- http://www.ceplovi.cz/matej/, Jabber: mc...@ceplovi.cz GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC I am a Roman Catholic, so that I do not expect `history' to be anything but a `long defeat' -- though it contains (and in a legend may contain more clearly and movingly) some samples or glimpses of final victory. -- J.R.R. Tolkien ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
Isn't that only a LocalStripFilter? http://www.crosswire.org/wiki/DevTools:conf_Files#Strip_Filters Or can any of these existing filters be used as a GlobalOptionFilter, becoming usable only when front-ends (and the relevant sword utilities) provide UI options to toggle them? cf. Diatheke already has two Arabic related options: p (Arabic Vowels) r (Arabic Shaping) David David -- View this message in context: http://sword-dev.350566.n4.nabble.com/testing-for-diacritics-tp4655091p4655192.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
We have a filter which does that - UTF8ArabicPoints. Peter On Wed, 2015-09-02 at 09:08 -0700, David Haslam wrote: > Peter, > > Are you also contemplating a new configuration item? e.g. > > GlobalOptionFilter=UTF8ArabicHarraket > > Might this be a useful enhancement to module AraNAV ? > > David > > > > -- > View this message in context: http://sword > -dev.350566.n4.nabble.com/testing-for-diacritics > -tp4655091p4655188.html > Sent from the SWORD Dev mailing list archive at Nabble.com. > > ___ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
On Wed, 2015-09-02 at 07:18 -0700, David Haslam wrote: > Windows users may find it useful to know that BabelPad has a menu > option to > strip diacritics. Again, the point of my request is not to remove diacritics per se, but to test their presence (the presence only of those we have a filter for stripping) and if so, to engage the appropriate GlobalOptionFilter in the conf file. While there are a whole bunch of ways of removing diacritics, including GUI tools and various ways of engaging with regexes, using the sword -library itself ensures that any GlobalOptionFilter set in a conf file is actually corresponding to a real, existing ability of the engine to do something with the text. It also ensures that any modulemaking scripts do not need to be kept separately in sync with whatever happens in the engine. Peter ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
On Wed, 2015-09-02 at 10:21 -0700, David Haslam wrote: > Isn't that only a LocalStripFilter? > > http://www.crosswire.org/wiki/DevTools:conf_Files#Strip_Filters > > Or can any of these existing filters be used as a GlobalOptionFilter, > becoming usable only when front-ends (and the relevant sword > utilities) > provide UI options to toggle them? Yes. It can be a GlobalOptionFilter. Not sure which frontends implement it. There is some weirdness around the whole lot still. Nevertheless. Peter > > cf. Diatheke already has two Arabic related options: > > p (Arabic Vowels) > r (Arabic Shaping) > > David > > > > David > > > > -- > View this message in context: http://sword > -dev.350566.n4.nabble.com/testing-for-diacritics > -tp4655091p4655192.html > Sent from the SWORD Dev mailing list archive at Nabble.com. > > ___ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
Windows users may find it useful to know that BabelPad has a menu option to strip diacritics. Convert | Other | Strip diacritics It certainly works well for Cyrillic & Latin scripts, as well as Hebrew & Greek. It may not work for Arabic/Persian scripts. Can you provide some examples of such with real diacritic characters? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/testing-for-diacritics-tp4655091p4655178.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
On Fri, 2015-08-28 at 14:13 -0400, Ryan wrote: > On Thu, 2015-08-27 at 23:22 +0100, Peter von Kaehne wrote: > > Is there a clever and reliable way one could test in a given OSIS > > text > > to see whether it contains diacritically enhanced texts or not? > > Perl, > > preferably. > > > > Specifically Hebrew, Arabic type alphabets and Greek - for all of > > which > > we have special a GlobalOptionFilter. > > Given a variable with a copy of the text using the unicode NFD > normalization, I would think that all you would need to do is test > for > the presence of the specific diacritic marks themselves. Thanks - but as I said in a previous email I did not want to test for individual items in my proposed utility as the filters (at least the Arabic one) will likely grow in future. Testing should be done using the engine. The amount of available Arabic diacritical marks is endless and not even remotely touched by our filter (which covers only standard Arabic and Persian). So any new item added to our filters would require amendment to the script to. I have now created a c++ example which is in svn for the kind of utility which I meant. It works as I wanted it - relying on the engine to do the lifting. I guess this is my own answer to my query. sword/examples/cmdline/stripaccents.cpp Peter ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
On Fri, 2015-08-28 at 01:27 +0200, Matěj Cepl wrote: iconv -f utf8 -t us-ascii//translit file.xml \ |diff -u - file.xml Thanks Matej, This would probably work on latin scripts with diacritics, but not on the scripts I am interested in - Hebrew, Arabic derrived and Greek. Peter ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
On Fri, 2015-08-28 at 09:21 +0100, Peter von Kaehne wrote: On Fri, 2015-08-28 at 01:27 +0200, Matěj Cepl wrote: iconv -f utf8 -t us-ascii//translit file.xml \ |diff -u - file.xml Thanks Matej, This would probably work on latin scripts with diacritics, but not on the scripts I am interested in - Hebrew, Arabic derrived and Greek But I think the basic idea is right. What I now think I need to do is to use the engine strip filters on the text content and compare the output with the input - if there is a difference the strip filter option needs setting. Thanks Matej! Peter ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
That is an option, but I do not like it. Reason is that it requires continous maintenance - adding new diacritic characters to the strip filters to expand their range is an ongoing effort. This would mean two places need constant attention. I am trying as much as possible to take the human factor out of module making. Peter Gesendet: Freitag, 28. August 2015 um 15:42 Uhr Von: David Troidl davidtro...@aol.com An: sword-devel@crosswire.org Betreff: Re: [sword-devel] testing for diacritics How about regular expressions: Modern Greek Accented [\u0370-\u0390 \u03AA-\u03B0 \u03CA-\u03D4] Polytonic Greek Accented [\u1F00-\u1FFE] Hebrew Vowel Points [\u05BB-\u05B0] Hebrew Cantillation [\u0591-\u05AE] I don't know about Arabic. Peace, David On 8/28/2015 4:21 AM, Peter von Kaehne wrote: On Fri, 2015-08-28 at 01:27 +0200, Matěj Cepl wrote: iconv -f utf8 -t us-ascii//translit file.xml \ |diff -u - file.xml Thanks Matej, This would probably work on latin scripts with diacritics, but not on the scripts I am interested in - Hebrew, Arabic derrived and Greek. Peter ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
On 2015-08-28, 08:21 GMT, Peter von Kaehne wrote: On Fri, 2015-08-28 at 01:27 +0200, Matěj Cepl wrote: iconv -f utf8 -t us-ascii//translit file.xml \ |diff -u - file.xml This would probably work on latin scripts with diacritics, but not on the scripts I am interested in - Hebrew, Arabic derrived and Greek. Did you try? I know that iconv has quite extensive number of transliteration rules. Other option would be to use recode (https://packages.debian.org/sid/recode, https://admin.fedoraproject.org/pkgdb/package/recode/ or http://directory.fsf.org/wiki/Recode)? It used to have a huge number of transliteration rules. Best, Matěj -- http://www.ceplovi.cz/matej/, Jabber: mc...@ceplovi.cz GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC For a successful technology, reality must take precedence over public relations, for nature cannot be fooled. -- R. P. Feynman's concluding sentence in his appendix to the Challenger Report ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
Gesendet: Freitag, 28. August 2015 um 16:59 Uhr Von: Matěj Cepl mc...@cepl.eu This would probably work on latin scripts with diacritics, but not on the scripts I am interested in - Hebrew, Arabic derrived and Greek. Did you try? Yes :-) ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] testing for diacritics
On Thu, 2015-08-27 at 23:22 +0100, Peter von Kaehne wrote: Is there a clever and reliable way one could test in a given OSIS text to see whether it contains diacritically enhanced texts or not? Perl, preferably. Specifically Hebrew, Arabic type alphabets and Greek - for all of which we have special a GlobalOptionFilter. Given a variable with a copy of the text using the unicode NFD normalization, I would think that all you would need to do is test for the presence of the specific diacritic marks themselves. Would be easy to do in python. I would imagine it would be easy to do in perl as well, for someone who knows how to write perl ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
[sword-devel] testing for diacritics
Is there a clever and reliable way one could test in a given OSIS text to see whether it contains diacritically enhanced texts or not? Perl, preferably. Specifically Hebrew, Arabic type alphabets and Greek - for all of which we have special a GlobalOptionFilter. I create most of the conf files automatically by analysing the osis source files as this is generally the best way to ensure that nothing gets forgotten, but the recent HebDelitzsch module would have benefited from me adding the relevant options to my scripts, instead of adding by hand to the conf files, which I have now done. Many thanks for any suggestions! Peter ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page