Re: [sword-devel] diatheke search type regex and the dot ?

2017-05-21 Thread Troy A. Griffitts
So, I did a little experimenting this weekend and found that the ICU
RegEx engine is actually really capable.

o  It's fast.

o  It supports {n,m} characters instead of bytes

o  It even works (though a little slow) with lookaheads and lookbacks,
e.g., for words in any order: (?=.*God)(?=.*world)(?=.*love)

whereas that fails to compile or simply doesn't work in our other
regex engine options.

So, I've added it as an option --with-icuregex  and actually made it the
default in usrinst.sh

You can check it out from trunk or else wait for the next RC.

Planning to look at the issues Peter mentioned and then push our another RC.

Troy


On 03/06/2017 06:17 PM, Troy A. Griffitts wrote:
>
> Yeah, so this page shows that c11x regex is still mostly unsupported
> in gcc:
>
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.tr1
>
> (see section 7)
>
> And the old school gnu regex we use otherwise I don't think knows
> anything about wide chars.  It simply compares bytes and does have a
> clue if some should be considered part of the same byte.  I suspect
> that because nowhere do we tell it that we're giving it UTF-8.
>
> Ultimately my hope is that gcc will improve eventually and solve our
> problem for us.  We could use
>
> We could add an option to use ICU RegexMatcher, but I'm still holding
> out for our compiler.
>
> Troy
>
>
> On 03/06/2017 05:52 PM, Karl Kleinpaste wrote:
>> On 03/06/2017 05:25 PM, Greg Hellings wrote:
>>> being off by 2 would seem strange to me
>> I don't understand this question at all.
>>
>> 0xE2 = 226 = 0342
>> 0x80 = 128 = 0200
>> 0x93 = 147 = 0223
>>
>> There's no off-by error at all.
>>
>> "od" is the "octal dump" tool; given -c, it tries to dump characters,
>> but outside 7-bit ASCII, it still dumps octal.
>>
>> For those familiar with dc(1), this will make sense
>> $ dc
>> 8o
>> 226p
>> 342
>> 128p
>> 200
>> 147p
>> 223
>> 16i
>> 0XE2p
>> 342
>> 0X80p
>> 200
>> 0X93p
>> 223
>>
>> The interesting questions are why C++11 regex can't find /en dash/,
>> and why non-C++11 regex doesn't understand multibyte.
>>
>>
>> ___
>> sword-devel mailing list: sword-devel@crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
>
> ___
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-24 Thread Jaak Ristioja
Another possibility is to use Boost.Xpressive [1], which I think
supports the Perl regular expressions at runtime, and also static
regular expressions using C++ syntax:

using namespace boost::xpressive;
// sregex rex = sregex::compile( "(\\w+) (\\w+)!" );
sregex rex = (s1= +_w) >> ' ' >> (s2= +_w) >> '!';

But I suppose you don't want to introduce Boost as a dependency.

J


[1]: http://www.boost.org/doc/libs/1_63_0/doc/html/xpressive.html

On 07.03.2017 03:17, Troy A. Griffitts wrote:
> Yeah, so this page shows that c11x regex is still mostly unsupported in gcc:
> 
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.tr1
> 
> (see section 7)
> 
> And the old school gnu regex we use otherwise I don't think knows
> anything about wide chars.  It simply compares bytes and does have a
> clue if some should be considered part of the same byte.  I suspect that
> because nowhere do we tell it that we're giving it UTF-8.
> 
> Ultimately my hope is that gcc will improve eventually and solve our
> problem for us.  We could use
> 
> We could add an option to use ICU RegexMatcher, but I'm still holding
> out for our compiler.
> 
> Troy
> 
> 
> On 03/06/2017 05:52 PM, Karl Kleinpaste wrote:
>> On 03/06/2017 05:25 PM, Greg Hellings wrote:
>>> being off by 2 would seem strange to me
>> I don't understand this question at all.
>>
>> 0xE2 = 226 = 0342
>> 0x80 = 128 = 0200
>> 0x93 = 147 = 0223
>>
>> There's no off-by error at all.
>>
>> "od" is the "octal dump" tool; given -c, it tries to dump characters,
>> but outside 7-bit ASCII, it still dumps octal.
>>
>> For those familiar with dc(1), this will make sense
>> $ dc
>> 8o
>> 226p
>> 342
>> 128p
>> 200
>> 147p
>> 223
>> 16i
>> 0XE2p
>> 342
>> 0X80p
>> 200
>> 0X93p
>> 223
>>
>> The interesting questions are why C++11 regex can't find /en dash/,
>> and why non-C++11 regex doesn't understand multibyte.
>>
>>
>> ___
>> sword-devel mailing list: sword-devel@crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
> 
> 
> 
> ___
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
> 


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-07 Thread David Haslam
Thanks, Karl,

Xiphos 4.0.4 in Windows 7 x64 gave this:

S:\>xiphos\diatheke -b KJV -s regex -k Abed...nego
Verses containing "Abed...nego"-- Daniel 1:7 ; Daniel 2:49 ; Daniel 3:12 ;
Daniel 3:13 ; Daniel 3:14 ; Daniel 3:16 ; Daniel 3:19 ; Daniel 3:20 ; Daniel
3:22 ; Daniel 3:23 ; Daniel 3:26 ; Daniel 3:28 ; Daniel 3:29 ; Daniel 3:30
-- 14 matches total (KJV)

It's evident that in Windows it behaves like it did in Linux after you
recompiled without cxx11regex.

Question: Does *regex* mean the same to diatheke search as it does for
Xiphos advanced search?

Best regards,

David

PS. I'm sure we can all forgive Greg for the mistaken "off by 2" claim. 




--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656920.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Karl Kleinpaste
On 03/06/2017 09:06 PM, DM Smith wrote:
> Does setting CLANG (or whatever it is) in the env help? In unix you
> have to tell the program what charset you are using. 

They already come along for the ride for free as a result of logging in,
per default specification when system was installed.

$ env|grep -i utf
LC_ALL=en_US.utf8
LANG=en_US.utf8

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread DM Smith
Does setting CLANG (or whatever it is) in the env help? In unix you have to 
tell the program what charset you are using. 

Cent from my fone so theer mite be tipos. ;)

> On Mar 6, 2017, at 7:52 PM, Karl Kleinpaste  wrote:
> 
>> On 03/06/2017 05:25 PM, Greg Hellings wrote:
>> being off by 2 would seem strange to me
> I don't understand this question at all.
> 
> 0xE2 = 226 = 0342
> 0x80 = 128 = 0200
> 0x93 = 147 = 0223
> 
> There's no off-by error at all.
> 
> "od" is the "octal dump" tool; given -c, it tries to dump characters, but 
> outside 7-bit ASCII, it still dumps octal.
> 
> For those familiar with dc(1), this will make sense
> $ dc
> 8o
> 226p
> 342
> 128p
> 200
> 147p
> 223
> 16i
> 0XE2p
> 342
> 0X80p
> 200
> 0X93p
> 223
> 
> The interesting questions are why C++11 regex can't find en dash, and why 
> non-C++11 regex doesn't understand multibyte.
> ___
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Troy A. Griffitts

Yeah, so this page shows that c11x regex is still mostly unsupported in gcc:

http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.tr1

(see section 7)

And the old school gnu regex we use otherwise I don't think knows 
anything about wide chars.  It simply compares bytes and does have a 
clue if some should be considered part of the same byte.  I suspect that 
because nowhere do we tell it that we're giving it UTF-8.


Ultimately my hope is that gcc will improve eventually and solve our 
problem for us.  We could use


We could add an option to use ICU RegexMatcher, but I'm still holding 
out for our compiler.


Troy


On 03/06/2017 05:52 PM, Karl Kleinpaste wrote:

On 03/06/2017 05:25 PM, Greg Hellings wrote:

being off by 2 would seem strange to me

I don't understand this question at all.

0xE2 = 226 = 0342
0x80 = 128 = 0200
0x93 = 147 = 0223

There's no off-by error at all.

"od" is the "octal dump" tool; given -c, it tries to dump characters, 
but outside 7-bit ASCII, it still dumps octal.


For those familiar with dc(1), this will make sense
$ dc
8o
226p
342
128p
200
147p
223
16i
0XE2p
342
0X80p
200
0X93p
223

The interesting questions are why C++11 regex can't find /en dash/, 
and why non-C++11 regex doesn't understand multibyte.



___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Karl Kleinpaste
On 03/06/2017 05:25 PM, Greg Hellings wrote:
> being off by 2 would seem strange to me
I don't understand this question at all.

0xE2 = 226 = 0342
0x80 = 128 = 0200
0x93 = 147 = 0223

There's no off-by error at all.

"od" is the "octal dump" tool; given -c, it tries to dump characters,
but outside 7-bit ASCII, it still dumps octal.

For those familiar with dc(1), this will make sense
$ dc
8o
226p
342
128p
200
147p
223
16i
0XE2p
342
0X80p
200
0X93p
223

The interesting questions are why C++11 regex can't find /en dash/, and
why non-C++11 regex doesn't understand multibyte.
___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Greg Hellings
On Mon, Mar 6, 2017 at 4:15 PM, David Haslam  wrote:

> Are we sure it's an "off by 2" error and not just an email typo?
>

I'm not sure of that at all. It was my first guess, but being off by 2
would seem strange to me, as I would expect a "fat finger" error to produce
an off-by-1 or a spurious extra digit added. But Karl would need to verify
that.


>
> I wasn't expecting decimal, I just didn't parse it as octal.
>

In the context of Octal, the values make the most sense as a typo on one
side or the other, to me.

--Greg


>
> David
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.
> nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656914.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> ___
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread David Haslam
Are we sure it's an "off by 2" error and not just an email typo?

I wasn't expecting decimal, I just didn't parse it as octal.

David



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656914.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Greg Hellings
147 = 0223 (octal)
128 = 0200 (octal)
226 = 0340 (octal)

So it's off by 2 in the top order byte. Not sure why, but it seems you're
expecting decimal but the tool is obviously giving out octal.

--Greg

On Mon, Mar 6, 2017 at 3:02 PM, David Haslam  wrote:

> Thanks Karl,
>
> All the "hyphenated" names in the KJV OT use the *en dash* character U+2013
> which has 3 UTF-8 bytes E2 80 93.
>
> In decimal, these are 226 128 147 so we might well wonder how your tool
> gave
> 342 200 223 ?
>
> Best regards,
>
> David
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.
> nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656912.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> ___
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread David Haslam
Thanks Karl,

All the "hyphenated" names in the KJV OT use the *en dash* character U+2013
which has 3 UTF-8 bytes E2 80 93.

In decimal, these are 226 128 147 so we might well wonder how your tool gave
342 200 223 ?

Best regards,

David



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656912.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-06 Thread Karl Kleinpaste
On 03/03/2017 09:16 PM, Troy A. Griffitts wrote:
> SWORD supports compiling with a variety of regex engines

I have an interesting result. My previous build of sword used
--with-cxx11regex, and that failed to find Abednego in any circumstance.
Reconfiguring without that option and rebuilding, I now get this result:

$ diatheke -b KJV -s regex -k Abednego
Entries containing "Abednego"-- none (KJV)
$ diatheke -b KJV -s regex -k Abed...nego
Entries containing "Abed...nego"-- Daniel 1:7Daniel 2:49 ; Daniel 3:12 ;
Daniel 3:13 ; Daniel 3:14 ; Daniel 3:16 ; Daniel 3:19 ; Daniel 3:20 ;
Daniel 3:22 ; Daniel 3:23 ; Daniel 3:26 ; Daniel 3:28 ; Daniel 3:29 ;
Daniel 3:30 ;  -- 14 matches total (KJV)
$ diatheke -b KJV -s regex -k Abed..nego
Entries containing "Abed..nego"-- none (KJV)
$ diatheke -b KJV -s regex -k Abed.nego
Entries containing "Abed.nego"-- none (KJV)

What's important here is that the dash in the middle of "Abed-nego" in
KJV appears as (from Dan.3.30, passed through "od -c"):
360   d   A   b   e   d 342 200 223   n   e   g   o   <   /   w

So diatheke with C++11 regex fails entirely, and diatheke without C++11
regex finds it only when the 3 component bytes of the dash character are
specified individually, which is to say, unaware of multibyte encoding
at all.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-04 Thread David Haslam
Corrigendum:  "everything outside ASCII"



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656901.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-04 Thread David Haslam
Thanks Troy,

The precise /flavour/ of *regex* supported by diatheke search really needs
to be properly documented.

Expecting the *dot* to be a byte when we're handling Unicode is just not on
at all.

I'm struggling more because I'm on Windows, where the UTF-16 verse UTF-8
disparity affects everything outside ANSI, but even the friends using
diatheke in Linux are having no success with the dot.

The character class *[.,;:]* treats it as just a full-stop punctuation mark.
cf.  I'm so used to having to escape the full-stop in most other contexts.
(e.g. Notepad++ search, TextPipe replace filters, etc).

If *regex* is to be of any real use, we shouldn't leave users to resort to
trial and error to see what works.

David





--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656900.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-03 Thread Troy A. Griffitts
SWORD supports compiling with a variety of regex engines-- typically GNU 
regex on most linux system.  We include 'internal regex' copy of this, 
as well.  We also will compile against the C++ standard regex engine 
including the language spec.  Each handles unicode characters different.


. is certainly recognized, but I would guess that in whatever regex 
library you are using during compile, it represents a byte and not a 
literal character.  Try .{1-6}



On 03/03/2017 07:36 AM, David Haslam wrote:

Created http://tracker.crosswire.org/browse/MODTOOLS-101

David



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656890.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page



___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-03 Thread David Haslam
Created http://tracker.crosswire.org/browse/MODTOOLS-101

David



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656890.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-03 Thread David Haslam
So what flavour of regex does diatheke actually use under Linux?

Why is it that the *dot metacharacter* is not recognized?

David



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656889.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-02 Thread Karl Kleinpaste
On 03/02/2017 02:14 PM, Greg Hellings wrote:
> I also get no results.

On the other hand...

$ mod2imp KJV | grep -B1 -i abed.nego | fgrep '$$'
$$$Daniel 1:7
$$$Daniel 2:49
$$$Daniel 3:12
$$$Daniel 3:13
$$$Daniel 3:14
$$$Daniel 3:16
$$$Daniel 3:19
$$$Daniel 3:20
$$$Daniel 3:22
$$$Daniel 3:23
$$$Daniel 3:26
$$$Daniel 3:28
$$$Daniel 3:29
$$$Daniel 3:30

Plain old regular expression search ("grep" origin is g/re/p, the
ancient syntax in UNIX' original line editor for "global regular
expression print") finds them. grep is locale-sensitive. and I have
LC_ALL=en_US.utf8.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-02 Thread David Haslam
Typo was only in the message, sorry!

The actual test in Windows shell with the -k there didn't give any matches.

David



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656884.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page


Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-02 Thread Greg Hellings
$ diatheke -b KJV -s regex -k Abed.nego
Verses containing "Abed.nego"-- none (KJV)

Once I correct the command to include the -k parameter, I also get no
results.

--Greg

On Thu, Mar 2, 2017 at 12:58 PM, David Haslam  wrote:

> I was under the impression that the metacharacter *dot* in a regex means
> "any
> single character".
>
> It would seem that for diatheke with *-s regex* this is not the case at
> all.
>
> Example:
>
> diatheke -b KJV -s regex Abed.nego
>
> In Windows command shell, that command line does not find the 15 instances
> of the name *Abed–nego* where the *en dash* (U+2013) is the punctuation
> mark
> in all such names.
>
> What happens in Linux?
>
> David
>
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.
> nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> ___
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] diatheke search type regex and the dot ?

2017-03-02 Thread David Haslam
I suspect this may be a further symptom of what Greg suggested as the
explanation in my other thread.

i.e. That SWORD expects to search in UTF-8 encoded text, whereas Windows
uses UTF-16 internally.

Still can't quite make out why the dot isn't treated how regular expressions
use it.

David



--
View this message in context: 
http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656881.html
Sent from the SWORD Dev mailing list archive at Nabble.com.

___
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page