Re: [sword-devel] testing for diacritics

2015-09-02 Thread David Haslam
Peter,

Are you also contemplating a new configuration item? e.g.

GlobalOptionFilter=UTF8ArabicHarraket

Might this be a useful enhancement to module AraNAV ?

David



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/testing-for-diacritics-tp4655091p4655188.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] testing for diacritics

2015-09-02 Thread Matěj Cepl
On 2015-09-02, 14:43 GMT, David Haslam wrote:
> For an online utility see http://www.harakat.ae/
>
> فِي الْبَدْءِ خَلَقَ اللهُ السَّمَاوَاتِ وَالأَرْضَ،
> becomes
> في البدء خلق الله السماوات والأرض، 

With a bit of web-scrapping, one could make a library using it 
as a webservice, couldn't we?

Matěj

-- 
http://www.ceplovi.cz/matej/, Jabber: mc...@ceplovi.cz
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC
 
I am a Roman Catholic, so that I do not expect `history' to be
anything but a `long defeat' -- though it contains (and in
a legend may contain more clearly and movingly) some samples or
glimpses of final victory.
  -- J.R.R. Tolkien


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] testing for diacritics

2015-09-02 Thread David Haslam
Isn't that only a LocalStripFilter?

http://www.crosswire.org/wiki/DevTools:conf_Files#Strip_Filters

Or can any of these existing filters be used as a GlobalOptionFilter,
becoming usable only when front-ends (and the relevant sword utilities)
provide UI options to toggle them?

cf. Diatheke already has two Arabic related options:

p (Arabic Vowels)
r (Arabic Shaping)

David



David



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/testing-for-diacritics-tp4655091p4655192.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] testing for diacritics

2015-09-02 Thread Peter von Kaehne
We have a filter which does that - UTF8ArabicPoints. 

Peter

On Wed, 2015-09-02 at 09:08 -0700, David Haslam wrote:
> Peter,
> 
> Are you also contemplating a new configuration item? e.g.
> 
> GlobalOptionFilter=UTF8ArabicHarraket
> 
> Might this be a useful enhancement to module AraNAV ?
> 
> David
> 
> 
> 
> --
> View this message in context: http://sword
> -dev.350566.n4.nabble.com/testing-for-diacritics
> -tp4655091p4655188.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
> 
> ___
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] testing for diacritics

2015-09-02 Thread Peter von Kaehne
On Wed, 2015-09-02 at 07:18 -0700, David Haslam wrote:
> Windows users may find it useful to know that BabelPad has a menu 
> option to
> strip diacritics.

Again, the point of my request is not to remove diacritics per se, but
to test their presence (the presence only of those we have a filter for
stripping) and if so, to engage the appropriate GlobalOptionFilter in
the conf file. 

While there are a whole bunch of ways of removing diacritics, including
GUI tools and various ways of engaging with regexes, using the sword
-library itself ensures that any GlobalOptionFilter set in a conf file
is actually corresponding to a real, existing ability of the engine to
do something with the text. It also ensures that any modulemaking
scripts do not need to be kept separately in sync with whatever happens
in the engine. 

Peter



___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] testing for diacritics

2015-09-02 Thread Peter von Kaehne
On Wed, 2015-09-02 at 10:21 -0700, David Haslam wrote:
> Isn't that only a LocalStripFilter?
> 
> http://www.crosswire.org/wiki/DevTools:conf_Files#Strip_Filters
> 
> Or can any of these existing filters be used as a GlobalOptionFilter,
> becoming usable only when front-ends (and the relevant sword 
> utilities)
> provide UI options to toggle them?

Yes. It can be a GlobalOptionFilter. Not sure which frontends implement
it. There is some weirdness around the whole lot still. Nevertheless.

Peter



> 
> cf. Diatheke already has two Arabic related options:
> 
> p (Arabic Vowels)
> r (Arabic Shaping)
> 
> David
> 
> 
> 
> David
> 
> 
> 
> --
> View this message in context: http://sword
> -dev.350566.n4.nabble.com/testing-for-diacritics
> -tp4655091p4655192.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
> 
> ___
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] testing for diacritics

2015-09-02 Thread David Haslam
Windows users may find it useful to know that BabelPad has a menu option to
strip diacritics.

Convert | Other | Strip diacritics

It certainly works well for Cyrillic & Latin scripts, as well as Hebrew &
Greek.

It may not work for Arabic/Persian scripts. 

Can you provide some examples of such with real diacritic characters?

David



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/testing-for-diacritics-tp4655091p4655178.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] testing for diacritics

2015-09-01 Thread Peter von Kaehne
On Fri, 2015-08-28 at 14:13 -0400, Ryan wrote:
> On Thu, 2015-08-27 at 23:22 +0100, Peter von Kaehne wrote:
> > Is there a clever and reliable way one could test in a given OSIS 
> > text
> > to see whether it contains diacritically enhanced texts or not? 
> > Perl,
> > preferably. 
> > 
> > Specifically Hebrew, Arabic type alphabets and Greek - for all of 
> > which
> > we have special a GlobalOptionFilter.
> 
> Given a variable with a copy of the text using the unicode NFD
> normalization, I would think that all you would need to do is test 
> for
> the presence of the specific diacritic marks themselves. 

Thanks - but as I said in a previous email I did not want to test for
individual items in my proposed utility as the filters (at least the
Arabic one) will likely grow in future. Testing should be done using
the engine. 

The amount of available Arabic diacritical marks  is endless and not
even remotely touched by our filter (which covers only standard Arabic
and Persian). So any new item added to our filters would require
amendment to the script to.

I have now created a c++ example which is in svn for the kind of 
 utility which I meant. It works as I wanted it - relying on the engine
to do the lifting. I guess this is my own answer to my query.

sword/examples/cmdline/stripaccents.cpp

Peter




___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] testing for diacritics

2015-08-28 Thread Peter von Kaehne
On Fri, 2015-08-28 at 01:27 +0200, Matěj Cepl wrote:
 iconv -f utf8 -t us-ascii//translit file.xml \
 |diff -u - file.xml

Thanks Matej,

This would probably work on latin scripts with diacritics, but not on
the scripts I am interested in - Hebrew, Arabic derrived and Greek.

Peter

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] testing for diacritics

2015-08-28 Thread Peter von Kaehne
On Fri, 2015-08-28 at 09:21 +0100, Peter von Kaehne wrote:
 On Fri, 2015-08-28 at 01:27 +0200, Matěj Cepl wrote:
  iconv -f utf8 -t us-ascii//translit file.xml \
  |diff -u - file.xml
 
 Thanks Matej,
 
 This would probably work on latin scripts with diacritics, but not on
 the scripts I am interested in - Hebrew, Arabic derrived and Greek

But I think the basic idea is right. 

What I now think I need to do is to use the engine strip filters on the
text content and compare the output with the input - if there is a
difference the strip filter option needs setting. 

Thanks Matej!

Peter

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] testing for diacritics

2015-08-28 Thread Peter Von Kaehne
That is an option, but I do not like it. Reason is that it requires continous 
maintenance - adding new diacritic characters to the strip filters to expand 
their range is an ongoing effort. This would mean two places need constant 
attention. I am trying as much as possible to take the human factor out of 
module making.

Peter

 

 Gesendet: Freitag, 28. August 2015 um 15:42 Uhr
 Von: David Troidl davidtro...@aol.com
 An: sword-devel@crosswire.org
 Betreff: Re: [sword-devel] testing for diacritics

 How about regular expressions:
 
 Modern Greek Accented
 [\u0370-\u0390 \u03AA-\u03B0 \u03CA-\u03D4]
 
 Polytonic Greek Accented
 [\u1F00-\u1FFE]
 
 Hebrew Vowel Points
 [\u05BB-\u05B0]
 
 Hebrew Cantillation
 [\u0591-\u05AE]
 
 I don't know about Arabic.
 
 Peace,
 
 David
 
 On 8/28/2015 4:21 AM, Peter von Kaehne wrote:
  On Fri, 2015-08-28 at 01:27 +0200, Matěj Cepl wrote:
  iconv -f utf8 -t us-ascii//translit file.xml \
   |diff -u - file.xml
  Thanks Matej,
 
  This would probably work on latin scripts with diacritics, but not on
  the scripts I am interested in - Hebrew, Arabic derrived and Greek.
 
  Peter
 
  ___
  sword-devel mailing list: sword-devel@crosswire.org
  http://www.crosswire.org/mailman/listinfo/sword-devel
  Instructions to unsubscribe/change your settings at above page
 
 
 ---
 This email has been checked for viruses by Avast antivirus software.
 https://www.avast.com/antivirus
 
 
 ___
 sword-devel mailing list: sword-devel@crosswire.org
 http://www.crosswire.org/mailman/listinfo/sword-devel
 Instructions to unsubscribe/change your settings at above page

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] testing for diacritics

2015-08-28 Thread Matěj Cepl
On 2015-08-28, 08:21 GMT, Peter von Kaehne wrote:
 On Fri, 2015-08-28 at 01:27 +0200, Matěj Cepl wrote:
 iconv -f utf8 -t us-ascii//translit file.xml \
 |diff -u - file.xml

 This would probably work on latin scripts with diacritics, but not on
 the scripts I am interested in - Hebrew, Arabic derrived and Greek.

Did you try? I know that iconv has quite extensive number of 
transliteration rules. Other option would be to use recode 
(https://packages.debian.org/sid/recode, 
https://admin.fedoraproject.org/pkgdb/package/recode/ or 
http://directory.fsf.org/wiki/Recode)? It used to have a huge 
number of transliteration rules.

Best,

Matěj

-- 
http://www.ceplovi.cz/matej/, Jabber: mc...@ceplovi.cz
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC
 
For a successful technology, reality must take precedence over
public relations, for nature cannot be fooled.
-- R. P. Feynman's concluding sentence
   in his appendix to the Challenger Report


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] testing for diacritics

2015-08-28 Thread Peter Von Kaehne


 Gesendet: Freitag, 28. August 2015 um 16:59 Uhr
 Von: Matěj Cepl mc...@cepl.eu

  This would probably work on latin scripts with diacritics, but not on
  the scripts I am interested in - Hebrew, Arabic derrived and Greek.
 
 Did you try? 

Yes :-)

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] testing for diacritics

2015-08-28 Thread Ryan
On Thu, 2015-08-27 at 23:22 +0100, Peter von Kaehne wrote:
 Is there a clever and reliable way one could test in a given OSIS text
 to see whether it contains diacritically enhanced texts or not? Perl,
 preferably. 
 
 Specifically Hebrew, Arabic type alphabets and Greek - for all of which
 we have special a GlobalOptionFilter.

Given a variable with a copy of the text using the unicode NFD
normalization, I would think that all you would need to do is test for
the presence of the specific diacritic marks themselves. Would be easy
to do in python. I would imagine it would be easy to do in perl as well,
for someone who knows how to write perl



___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


[sword-devel] testing for diacritics

2015-08-27 Thread Peter von Kaehne
Is there a clever and reliable way one could test in a given OSIS text
to see whether it contains diacritically enhanced texts or not? Perl,
preferably. 

Specifically Hebrew, Arabic type alphabets and Greek - for all of which
we have special a GlobalOptionFilter.

I create most of the conf files automatically by analysing the osis
source files as this is generally the best way to ensure that nothing
gets forgotten, but the recent HebDelitzsch module would have benefited
from me adding the relevant options to my scripts, instead of adding by
hand to the conf files, which I have now done.

Many thanks for any suggestions!

Peter 

 

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page