Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
This wchar in Windows is UTF-16. https://msdn.microsoft.com/en-us/library/windows/desktop/ff381407(v=vs.85).aspx Manfred > Am 02.03.2017 um 19:51 schrieb Greg Hellings: > > My only thought is that Windows doesn't use UTF-8 internally (it uses > UTF-16), while Sword assumes and demands UTF-8. Perhaps diatheke just blindly > consumes its input as UTF-8, and goes along its merry way? > > --Greg > > On Thu, Mar 2, 2017 at 11:48 AM, David Haslam wrote: > Greg, > > It was worth a test inside cygwin and the result was also a fail: > > $ xiphos/diatheke -b KJV -s regex -k Æneas > Verses containing "ãneas"-- none (KJV) > > I tried it with this too, and that fare no better: > > $ utils/diatheke -b KJV -s phrase -k Æneas > Verses containing "ãneas"-- none (KJV) > > That's my link to where the utils from our ftpmirror had been downloaded. > > Is it even worth installing PowerShell ? > > I'm beginning to think that diatheke.exe was never designed to cope with > non-ASCII searches. > > > David > > > > > -- > View this message in context: > http://sword-dev.350566.n4.nabble.com/In-Windows-command-shell-diatheke-search-is-restricted-to-ASCII-for-the-query-key-tp4656866p4656876.html > Sent from the SWORD Dev mailing list archive at Nabble.com. > > ___ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > > ___ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] diatheke search type regex and the dot ?
On 03/02/2017 02:14 PM, Greg Hellings wrote: > I also get no results. On the other hand... $ mod2imp KJV | grep -B1 -i abed.nego | fgrep '$$' $$$Daniel 1:7 $$$Daniel 2:49 $$$Daniel 3:12 $$$Daniel 3:13 $$$Daniel 3:14 $$$Daniel 3:16 $$$Daniel 3:19 $$$Daniel 3:20 $$$Daniel 3:22 $$$Daniel 3:23 $$$Daniel 3:26 $$$Daniel 3:28 $$$Daniel 3:29 $$$Daniel 3:30 Plain old regular expression search ("grep" origin is g/re/p, the ancient syntax in UNIX' original line editor for "global regular expression print") finds them. grep is locale-sensitive. and I have LC_ALL=en_US.utf8. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
[sword-devel] Uncategorised pages in our developers' wiki
Here they are - ten pages without a category, two of which we can ignore. https://crosswire.org/wiki/Special:UncategorizedPages David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Uncategorised-pages-in-our-developers-wiki-tp4656885.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] diatheke search type regex and the dot ?
Typo was only in the message, sorry! The actual test in Windows shell with the -k there didn't give any matches. David -- View this message in context: http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656884.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
[sword-devel] Complete Lexicon Functionality
Nearly eight years ago, this wiki page was made by Peter: https://crosswire.org/wiki/Complete_Lexicon_Functionality Looks like either everyone lost interest, or nobody else took any interest Worth revisiting? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/Complete-Lexicon-Functionality-tp4656883.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] diatheke search type regex and the dot ?
$ diatheke -b KJV -s regex -k Abed.nego Verses containing "Abed.nego"-- none (KJV) Once I correct the command to include the -k parameter, I also get no results. --Greg On Thu, Mar 2, 2017 at 12:58 PM, David Haslamwrote: > I was under the impression that the metacharacter *dot* in a regex means > "any > single character". > > It would seem that for diatheke with *-s regex* this is not the case at > all. > > Example: > > diatheke -b KJV -s regex Abed.nego > > In Windows command shell, that command line does not find the 15 instances > of the name *Abed–nego* where the *en dash* (U+2013) is the punctuation > mark > in all such names. > > What happens in Linux? > > David > > > > > -- > View this message in context: http://sword-dev.350566.n4. > nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879.html > Sent from the SWORD Dev mailing list archive at Nabble.com. > > ___ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] diatheke search type regex and the dot ?
I suspect this may be a further symptom of what Greg suggested as the explanation in my other thread. i.e. That SWORD expects to search in UTF-8 encoded text, whereas Windows uses UTF-16 internally. Still can't quite make out why the dot isn't treated how regular expressions use it. David -- View this message in context: http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879p4656881.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
Thanks, Greg. That's the best explanation that I've seen so far. David -- View this message in context: http://sword-dev.350566.n4.nabble.com/In-Windows-command-shell-diatheke-search-is-restricted-to-ASCII-for-the-query-key-tp4656866p4656880.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
[sword-devel] diatheke search type regex and the dot ?
I was under the impression that the metacharacter *dot* in a regex means "any single character". It would seem that for diatheke with *-s regex* this is not the case at all. Example: diatheke -b KJV -s regex Abed.nego In Windows command shell, that command line does not find the 15 instances of the name *Abed–nego* where the *en dash* (U+2013) is the punctuation mark in all such names. What happens in Linux? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/diatheke-search-type-regex-and-the-dot-tp4656879.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
My only thought is that Windows doesn't use UTF-8 internally (it uses UTF-16), while Sword assumes and demands UTF-8. Perhaps diatheke just blindly consumes its input as UTF-8, and goes along its merry way? --Greg On Thu, Mar 2, 2017 at 11:48 AM, David Haslamwrote: > Greg, > > It was worth a test inside cygwin and the result was also a fail: > > $ xiphos/diatheke -b KJV -s regex -k Æneas > Verses containing "ãneas"-- none (KJV) > > I tried it with this too, and that fare no better: > > $ utils/diatheke -b KJV -s phrase -k Æneas > Verses containing "ãneas"-- none (KJV) > > That's my link to where the utils from our ftpmirror had been downloaded. > > Is it even worth installing PowerShell ? > > I'm beginning to think that diatheke.exe was never designed to cope with > non-ASCII searches. > > > David > > > > > -- > View this message in context: http://sword-dev.350566.n4. > nabble.com/In-Windows-command-shell-diatheke-search-is- > restricted-to-ASCII-for-the-query-key-tp4656866p4656876.html > Sent from the SWORD Dev mailing list archive at Nabble.com. > > ___ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
On Thu, Mar 2, 2017 at 11:50 AM, Karl Kleinpastewrote: > On 03/02/2017 12:17 PM, David Haslam wrote: > > I am assuming that when Karl bundles these with Xiphos, he just uses what's > available and most recent in our SVN. > > I don't build/bundle them. They're whatever Greg built into the MinGW > Sword RPM. The Windows build script just includes them in the installer. > They were built 04 Feb 2016. > And I build the latest released versions. In this case, Sword 1.7.4 release. --Greg > > > ___ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
Exactly how they get compiled is beyond my pay grade. Even the jargon in your reply is outside my ken. ;>) David -- View this message in context: http://sword-dev.350566.n4.nabble.com/In-Windows-command-shell-diatheke-search-is-restricted-to-ASCII-for-the-query-key-tp4656866p4656877.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
Greg, It was worth a test inside cygwin and the result was also a fail: $ xiphos/diatheke -b KJV -s regex -k Æneas Verses containing "ãneas"-- none (KJV) I tried it with this too, and that fare no better: $ utils/diatheke -b KJV -s phrase -k Æneas Verses containing "ãneas"-- none (KJV) That's my link to where the utils from our ftpmirror had been downloaded. Is it even worth installing PowerShell ? I'm beginning to think that diatheke.exe was never designed to cope with non-ASCII searches. David -- View this message in context: http://sword-dev.350566.n4.nabble.com/In-Windows-command-shell-diatheke-search-is-restricted-to-ASCII-for-the-query-key-tp4656866p4656876.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
On 03/02/2017 12:17 PM, David Haslam wrote: > I am assuming that when Karl bundles these with Xiphos, he just uses what's > available and most recent in our SVN. I don't build/bundle them. They're whatever Greg built into the MinGW Sword RPM. The Windows build script just includes them in the installer. They were built 04 Feb 2016. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
The only reason I'm using the Sword utilities bundled with Xiphos is because they happen to be a more recent version than what I can find in the ftpmirror on our server. The former has diatheke version 4.7 and the latter diatheke version 4.6 I am assuming that when Karl bundles these with Xiphos, he just uses what's available and most recent in our SVN. It's not as if there's been a regular workflow to ensure that every update to the utilities is also compiled for Win32 and then uploaded to the ftpmirror. Best regards, David -- View this message in context: http://sword-dev.350566.n4.nabble.com/In-Windows-command-shell-diatheke-search-is-restricted-to-ASCII-for-the-query-key-tp4656866p4656873.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
Thanks Karl. I suppose Linux also succeeds when the *en dash* is properly used with any of the "hyphenated" names such as: Abel–beth–maachah Under Windows CMD, diatheke changes these to U+00FB LATIN SMALL LETTER U WITH CIRCUMFLEX. S:\>xiphos\diatheke -b KJV -s phrase -k Abel–beth–maachah Verses containing "Abelûbethûmaachah"-- none (KJV) All this is very unsatisfactory! btw. is the diatheke search type regex supposed to treat a dot as "any character" or does that only work in PCRE search patterns? Best regards, David -- View this message in context: http://sword-dev.350566.n4.nabble.com/In-Windows-command-shell-diatheke-search-is-restricted-to-ASCII-for-the-query-key-tp4656866p4656872.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
On Thu, Mar 2, 2017 at 10:27 AM, David Haslamwrote: > Hi Greg, > > Windows 7 x64 using ordinary cmd.exe as the command shell. > > Do you think I'd get better results if I called diatheke.exe from inside a > cygwin shell ? > I think that I don't like to think about UTF-8's vagaries across operating systems and prefer to work in programming languages where this is already a solved problem. :) But it's worth a test from Cygwin. > > btw. I've never used Windows PowerShell. > I even had to look it up in https://en.wikipedia.org/wiki/PowerShell It's supposed to be the replacement for CMD, but every once in a while Microsoft relents and updates CMD. Like everything Microsoft does, it's liable to stick around in both forms for over a decade. You could also test in this environment in addition to cygwin. That would give us more datapoints to see if the error is in our parsing of command line options, or if the shell is corrupting them for us. > > How come this only came to light in 2017 and there's nothing in our > developers' wiki about this problem? > I don't think that using those command line tools that are bundled with Xiphos is a particularly supported workflow. Even less so among people using characters outside of the 7-bit ASCII set. --Greg > > Regards, > > David > > > > -- > View this message in context: http://sword-dev.350566.n4. > nabble.com/In-Windows-command-shell-diatheke-search-is- > restricted-to-ASCII-for-the-query-key-tp4656866p4656869.html > Sent from the SWORD Dev mailing list archive at Nabble.com. > > ___ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
On Thu, Mar 2, 2017 at 10:29 AM, Karl Kleinpastewrote: > On 03/02/2017 11:14 AM, David Haslam wrote: > > Such a diatheke command works OK in Linux, or so I'm told. > > $ diatheke -b KJV -s regex -k Æneas > Entries containing "Æneas"-- none (KJV) > $ diatheke -b KJV -s lucene -k Æneas > Entries containing "Æneas"-- Acts 9:34Acts 9:33 ; -- 2 matches total (KJV) > $ diatheke -b KJV -s multiword -k Æneas > Entries containing "Æneas"-- Acts 9:33Acts 9:34 ; -- 2 matches total (KJV) > > Fedora 24. > > The output is buggy, not having put even a space between result elements. > Interesting, that regex search didn't find a literal string. > Most regex libraries I've encountered have a special mode that needs to be engaged in order to be UTF-8 aware. I wonder if our library has the same need (although I thought that the capital ash character was in ASCII, but I suppose I was wrong). --Greg > > > ___ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
Hi Greg, Windows 7 x64 using ordinary cmd.exe as the command shell. Do you think I'd get better results if I called diatheke.exe from inside a cygwin shell ? btw. I've never used Windows PowerShell. I even had to look it up in https://en.wikipedia.org/wiki/PowerShell How come this only came to light in 2017 and there's nothing in our developers' wiki about this problem? Regards, David -- View this message in context: http://sword-dev.350566.n4.nabble.com/In-Windows-command-shell-diatheke-search-is-restricted-to-ASCII-for-the-query-key-tp4656866p4656869.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
On 03/02/2017 11:14 AM, David Haslam wrote: > Such a diatheke command works OK in Linux, or so I'm told. $ diatheke -b KJV -s regex -k Æneas Entries containing "Æneas"-- none (KJV) $ diatheke -b KJV -s lucene -k Æneas Entries containing "Æneas"-- Acts 9:34Acts 9:33 ; -- 2 matches total (KJV) $ diatheke -b KJV -s multiword -k Æneas Entries containing "Æneas"-- Acts 9:33Acts 9:34 ; -- 2 matches total (KJV) Fedora 24. The output is buggy, not having put even a space between result elements. Interesting, that regex search didn't find a literal string. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
Don't use Windows? But in all seriousness, is this in CMD or PowerShell? What version of Windows is this? You very possibly could be running into a limitation of the operating system. --Greg On Thu, Mar 2, 2017 at 10:14 AM, David Haslamwrote: > This simply doesn't work inside the Windows command line shell. > > S:\>xiphos\diatheke -b KJV -s regex -k Æneas > Verses containing "ãneas"-- none (KJV) > > It changes the non-ASCII characters to something else entirely! > > Such a diatheke command works OK in Linux, or so I'm told. > > Is there a solution? > > David > > > > > > -- > View this message in context: http://sword-dev.350566.n4. > nabble.com/In-Windows-command-shell-diatheke-search-is- > restricted-to-ASCII-for-the-query-key-tp4656866.html > Sent from the SWORD Dev mailing list archive at Nabble.com. > > ___ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
[sword-devel] In Windows command shell, diatheke search is restricted to ASCII for the query key!
This simply doesn't work inside the Windows command line shell. S:\>xiphos\diatheke -b KJV -s regex -k Æneas Verses containing "ãneas"-- none (KJV) It changes the non-ASCII characters to something else entirely! Such a diatheke command works OK in Linux, or so I'm told. Is there a solution? David -- View this message in context: http://sword-dev.350566.n4.nabble.com/In-Windows-command-shell-diatheke-search-is-restricted-to-ASCII-for-the-query-key-tp4656866.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] How does the "diatheke" front end have search abilities? Is it using CLucene?
> Gesendet: Donnerstag, 02. März 2017 um 08:47 Uhr > Von: "David Haslam"> AFAIK, diatheke itself cannot generate a lucene index for a module. Correct. You can use mkfastmod as a commandline tool or indeed use xiphos' indeces. Peter ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
Re: [sword-devel] How does the "diatheke" front end have search abilities? Is it using CLucene?
AFAIK, diatheke itself cannot generate a lucene index for a module. You can specify the search type as *-s lucene* to ensure that it will make use of any existing index that the user may have already generated from another front-end such as Xiphos. Is my understanding correct? btw. For modules with lots of markup such as the CrossWire KJV, diatheke searches are very slow. I rarely use diatheke for this purpose. More often to just output a specified passage or verse, or ? Best regards, David -- View this message in context: http://sword-dev.350566.n4.nabble.com/How-does-the-diatheke-front-end-have-search-abilities-Is-it-using-CLucene-tp4656862p4656864.html Sent from the SWORD Dev mailing list archive at Nabble.com. ___ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page