Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-29 Thread porphyry5
On Monday, November 24, 2014 4:05:08 AM UTC-8, Erik Christiansen wrote:

> I am reminded of two quotes:
> 
> There are two ways of constructing a software design. One way is to make
> it so simple that there are obviously no deficiencies. And the other way
> is to make it so complicated that there are no obvious deficiencies.
>-C.A.R. Hoare
> 
> ... with proper design, the features come cheaply. This approach is
> arduous, but continues to succeed.-Dennis Ritchie
> 
> I agree that associative arrays do not seem arduous enough for the
> benefits they can bring to the right problem, but then, someone else has
> done all the work for us.
> 
> I'd normally test with a known 100% good list, and with a list with a
> known number of errors. Those lists do not need to be very long - a few
> dozen words ought to suffice. The third test - grinding through a pile
> of words, you have already done.
> 
> The exercise is a reminder to us all that the more lines of code we
> write, the more bugs creep in undetected.
> 
> Erik
> 
> -- 
> Sometimes you have to outsmart this stuff, it works for Murphy you know.
>  - Gene Heskett, on emc-users ML

Mr. Hoare has a fine sense of irony!

Meanwhile, the awk program passed its "good data" and "known errors" test, and 
I am grinding on through the "real data", so all seems satisfactory, at least 
as far as I want to take this.  So, once again, I thank you for all your help 
on this project, much appreciated.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-24 Thread Erik Christiansen
On 23.11.14 12:01, porphyry5 wrote:
>   However I smell a rat, its so astonishingly fast, less than 2 seconds
>   vs ~20 seconds for the previous version, and reports only 1050 total
>   errors, against more than 3000 total before, though that did include
>   good words reported as errors. I can't see that it should make any
>   difference, but I note you did recommend working from the back end
>   and isolating the suffix first.  That's very easily done, so I think
>   I'll try it as an easy check on the reliability of the result.  If it
>   produces exactly the same errors that would be most encouraging.

>   Oh, happy day, it produces exactly the same output ...

Such testing, both coming and going, with identical results, has to
improve confidence in the new implementation.  :-)

>   Now just so long as my enthusiasm for quick and easy answers isn't
>   blinding me to some lurking gotcha...

I am reminded of two quotes:

There are two ways of constructing a software design. One way is to make
it so simple that there are obviously no deficiencies. And the other way
is to make it so complicated that there are no obvious deficiencies.
   -C.A.R. Hoare

... with proper design, the features come cheaply. This approach is
arduous, but continues to succeed.-Dennis Ritchie

I agree that associative arrays do not seem arduous enough for the
benefits they can bring to the right problem, but then, someone else has
done all the work for us.

I'd normally test with a known 100% good list, and with a list with a
known number of errors. Those lists do not need to be very long - a few
dozen words ought to suffice. The third test - grinding through a pile
of words, you have already done.

The exercise is a reminder to us all that the more lines of code we
write, the more bugs creep in undetected.

Erik

-- 
Sometimes you have to outsmart this stuff, it works for Murphy you know.
 - Gene Heskett, on emc-users ML

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-23 Thread porphyry5
On Friday, November 21, 2014 1:31:20 AM UTC-8, Erik Christiansen wrote:
> On 20.11.14 11:54, porphyry5 wrote:
> > Annoyingly, I cannot see any clear way to use an associative array in
> > this case, because of that pesky word suffix list.  I believe 'if
> > (word in wd-list)' must either return "no match" or "exact match".  In
> > the binary search I check for exact matches, on failure immediately
> > followed by a partial match test, 'if (word ~ wd-list[index])' to
> > indicate the need to append suffixes to the partial matching wd-list
> > entry.
> 
> Ah ... perhaps the easiest way is to detect recognisable suffixes on
> input words, and strip them for the initial match attempt, i.e. only the
> part of the word you want is hashed. A flag, or non-null
> "found_this_suffix" string variable, retained from the partial-match
> generating pre-stripping, then guides any additional actions, if
> required.
> 
> To cover the case where a word with a recognisable suffix is in the list
> with suffix, rather than without, a check-on-match-failure for the
> unstripped word could be performed. It would only occur when a suffix is
> detected, and would in also be faster than a search.
> 
> That essentially reverses the order of match vs suffix handling.
> Speed-wise, I'd expect the pre-strip to be quite a bit faster than the
> partial match, since the suffix list is unlikely to number 200,000
> entries.
> 
> Erik
> 
> -- 
> Britain had first obtained a commercial Enigma machine back in 1927, by simply
> purchasing one in the open in Germany. The machine was analysed and a 
> diagnostic
> report written on how it worked.- 
> http://www.bbc.co.uk/news/magazine-17486464

This is becoming a most interesting project.  Checking further in my 
2000 odd error file I began discovering certain words that the binary search 
should have found, but didn't.  The cause being that the search did not 
necessarily hit the root word if it was followed by variants of that root.
This redefined the problem to: you have to find the root of the word of 
interest if you can't find the word itself.  To ensure that meant use an 
associative array, which also allowed an associative array for suffixes.
Cuts out many lines of code, the entire identification process is now 
just 
function bs() {
r=""
if (v in b) return r
j=1
n=length(v)
while (jn) {
if (substr(v, j) in s && substr(v, 1, j-1) in b) return 
r
j--
}
r="@@"
return r
}
Now just so long as my enthusiasm for quick and easy answers isn't 
blinding me to some lurking gotcha...

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-21 Thread Erik Christiansen
On 20.11.14 11:54, porphyry5 wrote:
> Annoyingly, I cannot see any clear way to use an associative array in
> this case, because of that pesky word suffix list.  I believe 'if
> (word in wd-list)' must either return "no match" or "exact match".  In
> the binary search I check for exact matches, on failure immediately
> followed by a partial match test, 'if (word ~ wd-list[index])' to
> indicate the need to append suffixes to the partial matching wd-list
> entry.

Ah ... perhaps the easiest way is to detect recognisable suffixes on
input words, and strip them for the initial match attempt, i.e. only the
part of the word you want is hashed. A flag, or non-null
"found_this_suffix" string variable, retained from the partial-match
generating pre-stripping, then guides any additional actions, if
required.

To cover the case where a word with a recognisable suffix is in the list
with suffix, rather than without, a check-on-match-failure for the
unstripped word could be performed. It would only occur when a suffix is
detected, and would in also be faster than a search.

That essentially reverses the order of match vs suffix handling.
Speed-wise, I'd expect the pre-strip to be quite a bit faster than the
partial match, since the suffix list is unlikely to number 200,000
entries.

Erik

-- 
Britain had first obtained a commercial Enigma machine back in 1927, by simply
purchasing one in the open in Germany. The machine was analysed and a diagnostic
report written on how it worked.- 
http://www.bbc.co.uk/news/magazine-17486464

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-20 Thread porphyry5
On Wednesday, November 19, 2014 8:04:59 PM UTC-8, Erik Christiansen wrote:
> On 19.11.14 12:21, porphyry5 wrote:
> > So if I read this hash table stuff correctly any data item that one
> > wants to look up in an associative array generates its own address in
> > memory, by using (a constant number of bytes of?) its value as a
> > single binary number and modifying that with some arithmetic algorithm
> > to produce an address in the available range.  That is a really sweet
> > concept, but it must be rather wasteful of memory, lots of unused
> > spaces within its range.
> 
> In awk, at least (and probably most later adopters, such as perl), the
> memory is malloc-ed, i.e. builds as needed, and array entries can be of
> any size within available memory. The fact that an array need not be
> dimensioned, that array entries are created when referenced, and can be
> deleted, means that the implementation is not actually a simple array.
> 
> Hash buckets, in other areas I've encountered, have usually been
> implemented as linked lists, but that is not cast in stone. If there is
> only one element at the hashed address, as is needed in encryption
> hashes, then there's no list there to walk. I have not examined the code
> to check how the hash table is implemented.
> 
> > I use ultra-cheap, under-powered laptops (currently acer c720, new
> > laptop for < $200) which has only 2gb memory, but for 200,000 odd
> > words that's 10K available per word, lots of available waste space.
> 
> Associative arrays will certainly conserve memory. The cost of this is
> that hashing a word takes one iteration of a small loop per character,
> before using that to address the hash table. (of pointers to the array
> elements) They are still more efficient than most other methods.
> 
> For examples on the use of them, googling for the "Effective AWK
> Programming" manual should allow you to download a copy. The book
> The AWK Programming Language" by the eponymous authors of the language
> is published by Addison Wesley. Its concise presentation of illuminating
> examples, and permuted index, facilitate rapid extraction of the
> knowledge desired, without faffing around in overly elaborate verbosity.
> 
> Any good reference will warn that "if (word in array) ... " syntax is
> needed for testing membership, since "if (array[word] != 0) ..." will
> create the element before testing it. (Please don't ask me how I know
> that.)
> 
> > I really appreciate you hammering home this point about associative
> > arrays, I always assumed, their being so convenient to use, that they
> > had to be a very performance-costly luxury.  Who knew.
> 
> You are welcome. The language came out in 1977, and I've found it very
> useful and quick to code, in three decades in IT. Its C-like syntax
> makes it less of a write-only language than some others, I find.
> 
> It is though, interpreted, not compiled, so manually looping over long
> lists is slow. Hashing should sort that out.
> 
> > But I will forthwith rewrite the awk program with associative arrays
> > and compare the timing for the two methods.  Thank you very much for
> > the trouble you have gone to on this.
> 
> You are most welcome. All I did was point out the fruit hanging on the
> tree. Seeing someone new productively making jam with it is thanks enough.
> 
> Erik
> 
> -- 
> Unix isn't hard, it's just a lot.  (Ascribed to one of its originators)

Annoyingly, I cannot see any clear way to use an associative array in this 
case, because of that pesky word suffix list.  I believe 'if (word in wd-list)' 
must either return "no match" or "exact match".  In the binary search I check 
for exact matches, on failure immediately followed by a partial match test, 'if 
(word ~ wd-list[index])' to indicate the need to append suffixes to the partial 
matching wd-list entry.

Pity really, but what I have can be lived with, it takes about 20 seconds to 
process a 700K text.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-19 Thread Erik Christiansen
On 19.11.14 12:21, porphyry5 wrote:
> So if I read this hash table stuff correctly any data item that one
> wants to look up in an associative array generates its own address in
> memory, by using (a constant number of bytes of?) its value as a
> single binary number and modifying that with some arithmetic algorithm
> to produce an address in the available range.  That is a really sweet
> concept, but it must be rather wasteful of memory, lots of unused
> spaces within its range.

In awk, at least (and probably most later adopters, such as perl), the
memory is malloc-ed, i.e. builds as needed, and array entries can be of
any size within available memory. The fact that an array need not be
dimensioned, that array entries are created when referenced, and can be
deleted, means that the implementation is not actually a simple array.

Hash buckets, in other areas I've encountered, have usually been
implemented as linked lists, but that is not cast in stone. If there is
only one element at the hashed address, as is needed in encryption
hashes, then there's no list there to walk. I have not examined the code
to check how the hash table is implemented.

> I use ultra-cheap, under-powered laptops (currently acer c720, new
> laptop for < $200) which has only 2gb memory, but for 200,000 odd
> words that's 10K available per word, lots of available waste space.

Associative arrays will certainly conserve memory. The cost of this is
that hashing a word takes one iteration of a small loop per character,
before using that to address the hash table. (of pointers to the array
elements) They are still more efficient than most other methods.

For examples on the use of them, googling for the "Effective AWK
Programming" manual should allow you to download a copy. The book
The AWK Programming Language" by the eponymous authors of the language
is published by Addison Wesley. Its concise presentation of illuminating
examples, and permuted index, facilitate rapid extraction of the
knowledge desired, without faffing around in overly elaborate verbosity.

Any good reference will warn that "if (word in array) ... " syntax is
needed for testing membership, since "if (array[word] != 0) ..." will
create the element before testing it. (Please don't ask me how I know
that.)

> I really appreciate you hammering home this point about associative
> arrays, I always assumed, their being so convenient to use, that they
> had to be a very performance-costly luxury.  Who knew.

You are welcome. The language came out in 1977, and I've found it very
useful and quick to code, in three decades in IT. Its C-like syntax
makes it less of a write-only language than some others, I find.

It is though, interpreted, not compiled, so manually looping over long
lists is slow. Hashing should sort that out.

> But I will forthwith rewrite the awk program with associative arrays
> and compare the timing for the two methods.  Thank you very much for
> the trouble you have gone to on this.

You are most welcome. All I did was point out the fruit hanging on the
tree. Seeing someone new productively making jam with it is thanks enough.

Erik

-- 
Unix isn't hard, it's just a lot.  (Ascribed to one of its originators)

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-19 Thread porphyry5
On Tuesday, November 18, 2014 11:38:19 PM UTC-8, John Little wrote:
> Not much to the point, but I couldn't let this pass:
> porphyry5 said:
> >... I used a plain numeric index as I figured it must use an address array 
> >to reference the words array, and with a numeric index I could use a binary 
> >search pattern to locate the word.  I think an associative array must use a 
> >linear search... 
> Associative array implementations usually use some kind of hashing, with 
> mostly constant time look ups. You can tell if you iterate over the keys and 
> they come in some weird order. 
> 
> Regards, John Little

Thank you also for the advice to use associative arrays.  I've wondered for 
years what hashing and hash tables actually entailed, but always too lazy to 
check them out.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-19 Thread porphyry5
On Wednesday, November 19, 2014 12:10:08 AM UTC-8, Erik Christiansen wrote:
> On 18.11.14 15:26, porphyry5 wrote:
> > I'm not sure how awk organizes arrays internally, but I used a plain
> > numeric index as I figured it must use an address array to reference
> > the words array, and with a numeric index I could use a binary search
> > pattern to locate the word.
> 
> That is all unnecessary - location is efficiently done for us, _without_
> searching, in an associative array:
> 
> http://en.wikipedia.org/wiki/Associative_array
> 
> > I think an associative array must use a linear search pattern because
> > awk has no way of knowing if the array is actually in sequence.
> 
> Erroneous assumption. The wikipedia page has links to "hash table" and
> "hash function". A quick glance at them indicates that they should
> suffice to explain. An associative array uses a hash to directly vector
> to an array element indexed by an arbitrary string. As a consequence,
> iteration over the array does not result in an easily anticipated
> sequence. Does that matter when the array exists primarily for testing
> set membership?
> 
> With the associative array, no iteration over the list is required. The
> hashing algorithm rapidly generates the address of a "bucket",
> containing from one to just a handful (if the hashing algorithm is poor,
> and the array elements exceedingly numerous) of strings with the same
> hash. I.e. we go straight to the match. It can be seen as a form of
> "content addressable memory", perhaps.
> 
> That is why I previously wrote:
> > > and membership tested with an "if (word in list) ..." in an
> > > unconditional action handling the input stream".
> 
> If words arrive one per line, then:
> 
>{ if ($1 in my_word_list) ... ; else ... }
> 
> immediately tests membership, without any search loop - explicit or
> implicit. To fill the associative array, this should suffice:
> 
> BEGIN {
>file = "path/to/my_words"
>while ((x = getline < file) > 0) # It does need the extra braces.
>{  my_word_list[$1]++ }  # If entry > 1, duplicate.
>if (x < 0)
>{  print "\n\t my_scriptname: " file ": File not found.\n" ; exit }
> }
> 
> AND, if an alphabetical sort of the word list is ever required for some
> reason, then either presort "path/to/my_words", or pipe it to sort,
> either in the shell or within awk, when needed. It is just not
> economical to iterate (even in a binary search) over 200,000 words, for
> _every_ input word, just to have a sorted word list, maybe once or twice
> a year, if ever.
> 
> > And of course, I have a second array of word suffixes to reference if
> > the word of interest is not the root form.
> 
> Now I'm curious - how are the suffixes matched to those of the 200,000
> words which may take them? I have a Danish word list which simply
> includes each word variant explicitly. If an explicit list takes you to
> a million words, then you just need a good hashing algorithm and a
> million buckets, _if_ a short search through two or three words is to be
> avoided. But a short search in a small bucket would have to be faster
> than figuring out which suffixes go with what. Such matching would need
> each list word to be identified as noun, adjective, or verb, at a
> minimum, to permit correct attachment of adverb suffixes, wouldn't it?
> 
> You seem to have a quite interesting task to deal with.
> 
> Erik
> 
> -- 
> mfox: You can't have infinite growth in a finite world, over population will 
> doom
> us all in the end because capitalism depends on infinite growth. We are just
> rearranging deck chairs on the titanic.
> Maxx: Totally agree mfox, I think someone put Norman Lindsay's "The Magic
> Pudding" in the non-fiction section.
> - 
> http://www.abc.net.au/news/2014-10-16/kohler-when-a-central-banker-talks-like-this-pay-attention/5815392

So if I read this hash table stuff correctly any data item that one wants to 
look up in an associative array generates its own address in memory, by using 
(a constant number of bytes of?) its value as a single binary number and 
modifying that with some arithmetic algorithm to produce an address in the 
available range.  That is a really sweet concept, but it must be rather 
wasteful of memory, lots of unused spaces within its range.  I use ultra-cheap, 
under-powered laptops (currently acer c720, new laptop for < $200) which has 
only 2gb memory, but for 200,000 odd words that's 10K available per word, lots 
of available waste space.

I really appreciate you hammering home this point about associative arrays, I 
always assumed, their being so convenient to use, that they had to be a very 
performance-costly luxury.  Who knew.

But I will forthwith rewrite the awk program with associative arrays and 
compare the timing for the two methods.  Thank you very much for the trouble 
you have gone to on this.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text y

Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-19 Thread Erik Christiansen
On 18.11.14 15:26, porphyry5 wrote:
> I'm not sure how awk organizes arrays internally, but I used a plain
> numeric index as I figured it must use an address array to reference
> the words array, and with a numeric index I could use a binary search
> pattern to locate the word.

That is all unnecessary - location is efficiently done for us, _without_
searching, in an associative array:

http://en.wikipedia.org/wiki/Associative_array

> I think an associative array must use a linear search pattern because
> awk has no way of knowing if the array is actually in sequence.

Erroneous assumption. The wikipedia page has links to "hash table" and
"hash function". A quick glance at them indicates that they should
suffice to explain. An associative array uses a hash to directly vector
to an array element indexed by an arbitrary string. As a consequence,
iteration over the array does not result in an easily anticipated
sequence. Does that matter when the array exists primarily for testing
set membership?

With the associative array, no iteration over the list is required. The
hashing algorithm rapidly generates the address of a "bucket",
containing from one to just a handful (if the hashing algorithm is poor,
and the array elements exceedingly numerous) of strings with the same
hash. I.e. we go straight to the match. It can be seen as a form of
"content addressable memory", perhaps.

That is why I previously wrote:
> > and membership tested with an "if (word in list) ..." in an
> > unconditional action handling the input stream".

If words arrive one per line, then:

   { if ($1 in my_word_list) ... ; else ... }

immediately tests membership, without any search loop - explicit or
implicit. To fill the associative array, this should suffice:

BEGIN {
   file = "path/to/my_words"
   while ((x = getline < file) > 0) # It does need the extra braces.
   {  my_word_list[$1]++ }  # If entry > 1, duplicate.
   if (x < 0)
   {  print "\n\t my_scriptname: " file ": File not found.\n" ; exit }
}

AND, if an alphabetical sort of the word list is ever required for some
reason, then either presort "path/to/my_words", or pipe it to sort,
either in the shell or within awk, when needed. It is just not
economical to iterate (even in a binary search) over 200,000 words, for
_every_ input word, just to have a sorted word list, maybe once or twice
a year, if ever.

> And of course, I have a second array of word suffixes to reference if
> the word of interest is not the root form.

Now I'm curious - how are the suffixes matched to those of the 200,000
words which may take them? I have a Danish word list which simply
includes each word variant explicitly. If an explicit list takes you to
a million words, then you just need a good hashing algorithm and a
million buckets, _if_ a short search through two or three words is to be
avoided. But a short search in a small bucket would have to be faster
than figuring out which suffixes go with what. Such matching would need
each list word to be identified as noun, adjective, or verb, at a
minimum, to permit correct attachment of adverb suffixes, wouldn't it?

You seem to have a quite interesting task to deal with.

Erik

-- 
mfox: You can't have infinite growth in a finite world, over population will 
doom
us all in the end because capitalism depends on infinite growth. We are just
rearranging deck chairs on the titanic.
Maxx: Totally agree mfox, I think someone put Norman Lindsay's "The Magic
Pudding" in the non-fiction section.
- 
http://www.abc.net.au/news/2014-10-16/kohler-when-a-central-banker-talks-like-this-pay-attention/5815392

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-18 Thread John Little
Not much to the point, but I couldn't let this pass:
porphyry5 said:
>... I used a plain numeric index as I figured it must use an address array to 
>reference the words array, and with a numeric index I could use a binary 
>search pattern to locate the word.  I think an associative array must use a 
>linear search... 
Associative array implementations usually use some kind of hashing, with mostly 
constant time look ups. You can tell if you iterate over the keys and they come 
in some weird order. 

Regards, John Little 

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-18 Thread porphyry5
On Tuesday, November 18, 2014 2:37:22 AM UTC-8, Erik Christiansen wrote:
> On 17.11.14 10:57, Graham Lawrence wrote:
> > For my test file the awk program tagged some 3500 words, with 1960 of them
> > unique, so this vim script must run within a loop to avoid the tedium and
> > 4000 odd keystrokes required to invoke it individually for each unique
> > error,
> 
> Er, what script loop, and what "4000 odd keystrokes [per] error", if one
> may be so bold?

A while loop to enclose the mapping as you saw it, I never add such details 
until I have the rest of the code working satisfactorily.  Not 4000 keystrokes 
per error, ~4000 for all 1960 uniques errors with a 2 keystroke code to invoke 
the mapping.

All of which is redundant now, as I realized I could cut it to one keystroke 
per error by splitting it into 2 mappings, which allowed eliminating the need 
for user input entirely, in which the 2nd mapping ends by reinvoking the first.

> If the list of good words is read into an associative
> array (lets call it "list") in the BEGIN action, and membership tested
> with an "if (word in list) ..." in an unconditional action handling the
> input stream, _and_ the unrecognised words (sans @@) are printed to
> another file, then it is only necessary to open that file in vim, and
> for each word (one per line), hit ":.w >> /path/goodfile" for each word
> which we accept as good. With that aliased to a key of choice, only one
> keystroke is required to qualify each word. Both awk and vim are run
> once per session, handling thousands of words each time, if you have
> them. Four thousand keystrokes would handle 4000 errors.

In practice, it is not that straightforward.  I'm not sure how awk organizes 
arrays internally, but I used a plain numeric index as I figured it must use an 
address array to reference the words array, and with a numeric index I could 
use a binary search pattern to locate the word.  I think an associative array 
must use a linear search pattern because awk has no way of knowing if the array 
is actually in sequence.

And of course, I have a second array of word suffixes to reference if the word 
of interest is not the root form.

> 
> If these are e.g. ordinary English words, is it acceptable to read in
> e.g. /usr/share/dict/british-english into "list", to start with 98,000
> or more good words in the BEGIN action, before reading in your list of
> special words, 

Project Gutenberg provides Webster's Dictionary from about 1913.  I extracted 
all the words from the html, and it reduced to about 200,000 unique words.  I 
use arch and they don't include such refinements as dictionaries in their 
distro.

> 
> Erik
> (Who is doubtless glossing over some undeclared additional requirement. :)
> 
> -- 
> Melbourne Water Use:
>"More water is lost to stormwater each year than we use. On average we use
> about 40 billion litres of water each year, and each year about 500 billion
> litres runs into our drains." Leonie Duncan, Environment Victoria healthy 
> river
> campaigner, quoted on p7 of Journal 21.10.08.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-18 Thread Erik Christiansen
On 17.11.14 10:57, Graham Lawrence wrote:
> For my test file the awk program tagged some 3500 words, with 1960 of them
> unique, so this vim script must run within a loop to avoid the tedium and
> 4000 odd keystrokes required to invoke it individually for each unique
> error,

Er, what script loop, and what "4000 odd keystrokes [per] error", if one
may be so bold? If the list of good words is read into an associative
array (lets call it "list") in the BEGIN action, and membership tested
with an "if (word in list) ..." in an unconditional action handling the
input stream, _and_ the unrecognised words (sans @@) are printed to
another file, then it is only necessary to open that file in vim, and
for each word (one per line), hit ":.w >> /path/goodfile" for each word
which we accept as good. With that aliased to a key of choice, only one
keystroke is required to qualify each word. Both awk and vim are run
once per session, handling thousands of words each time, if you have
them. Four thousand keystrokes would handle 4000 errors.

If these are e.g. ordinary English words, is it acceptable to read in
e.g. /usr/share/dict/british-english into "list", to start with 98,000
or more good words in the BEGIN action, before reading in your list of
special words, 

Erik
(Who is doubtless glossing over some undeclared additional requirement. :)

-- 
Melbourne Water Use:
   "More water is lost to stormwater each year than we use. On average we use
about 40 billion litres of water each year, and each year about 500 billion
litres runs into our drains." Leonie Duncan, Environment Victoria healthy river
campaigner, quoted on p7 of Journal 21.10.08.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-17 Thread Ben Fritz
On Monday, November 17, 2014 12:57:31 PM UTC-6, porphyry5 wrote:
> 
> Is there a method of getting the screen to constantly show the cursor as a 
> .vim script progresses?
> 
>

It should be doing it anyway, but the :redraw or :redraw! command can often 
force it.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-17 Thread Graham Lawrence
I thank you both, Tim and Ben, for your help.  Let me explain the situation
more fully.  I'm testing the feasibility of semi-automated repair of texts
that have been ocr-ed from less than ideal sources, most notably from
books.google.com.  To that end I've written two scripts; the first in sed
to detect and impose formatting structure on the text from cues within the
text itself; the second in awk to refer every word in the text to a
word-list, and to prepend every word not in the list with @@.

At this point the text must be inspected visually to decide if each @@word
is an error or not.  If not, the @@ is removed and the actual word is saved
to be added to the awk script's word list.  If an error, the @@ is changed
to qq (for later manual correction), so that errors that have already been
processed will not be reprocessed by this vim script.

For my test file the awk program tagged some 3500 words, with 1960 of them
unique, so this vim script must run within a loop to avoid the tedium and
4000 odd keystrokes required to invoke it individually for each unique
error, which I assume invalidates the method I'm trying here, Tim, because
then the :s...gce command cannot be forced to be last in the script, it
must be followed by 'endw'.  That is not a problem as the same effect can
be achieved using the input() function, and I have written such two ways,
as a key-mapping and as a .vim script.  Both of these will do the job, but
neither is actually useful for visual reasons.

It is essential that the user be able to read the text about the current
@@word, to determine whether or not it is an error.  But when the
key-mapping stops at the input() prompt vim lists above it all preceding
commands in the while loop, which often has the effect of pushing the
sentence of interest off the top of the screen.  How can I prevent that
behavior?

The .vim approach is even worse as with that the screen does not update at
all as the script works its way throuhg the text.  I cannot find any
function in :h function-list that will actually cause the screen to show
the text surrounding the current cursor position, and including say 'norm
zz' immediately before input() has no effect at all.  Is there a method of
getting the screen to constantly show the cursor as a .vim script
progresses?


On Mon, Nov 17, 2014 at 9:01 AM, Ben Fritz  wrote:

> On Monday, November 17, 2014 5:27:47 AM UTC-6, Tim Chase wrote:
> > On 2014-11-15 13:59, porphyry5 wrote:
> > > On Friday, November 14, 2014 4:02:55 PM UTC-8, porphyry5 wrote:
> > > > In a key mapping I use the command ':%s//\=@o/gce'.
> > > >
> > > > The command executes as expected except that it behaves as if the
> > > > c flag were not set.  Is this flag unavailable in a key mapping,
> > > > or is there some other option that needs to be set for it to
> > > > work.  It works as expected at the command line.
> > >
> > > This is the mapping concerned:
> > > "map ,, /@@"myWcwqqh"oywxx"nywma:let
> > > @/=@m:%s//\=@n/ge:let @/=@n:%s//\=@o/gce`ay2h`a:if
> > > @" != 'qq':norm "Zyw:en
> >
> > Ah, I believe the problem is triggered because the atoms after the
> > ":%s//\=@o/gce" are interpreted as answers to the y/n/a/q/l/^E/^Y
> > prompt.  The back-tick is ignored and the "a" (the subsequent atom)
> > is interpreted as "a"ll the remaining matches.
> >
> > For this to work (actually prompting the user), the
> > ":%s//\=@o/gce" has to be the last item in your mapping, leaving
> > the :s command in the user-prompting state.
> >
>
> If this is the cause, it's probably cleaner to wrap everything in a
> function with one command per line, and call the function from the mapping.
> Then there are fewer ways for it to go awry.




-- 
Graham Lawrence

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-17 Thread Ben Fritz
On Monday, November 17, 2014 5:27:47 AM UTC-6, Tim Chase wrote:
> On 2014-11-15 13:59, porphyry5 wrote:
> > On Friday, November 14, 2014 4:02:55 PM UTC-8, porphyry5 wrote:
> > > In a key mapping I use the command ':%s//\=@o/gce'.
> > > 
> > > The command executes as expected except that it behaves as if the
> > > c flag were not set.  Is this flag unavailable in a key mapping,
> > > or is there some other option that needs to be set for it to
> > > work.  It works as expected at the command line.
> > 
> > This is the mapping concerned:
> > "map ,, /@@"myWcwqqh"oywxx"nywma:let
> > @/=@m:%s//\=@n/ge:let @/=@n:%s//\=@o/gce`ay2h`a:if
> > @" != 'qq':norm "Zyw:en
> 
> Ah, I believe the problem is triggered because the atoms after the
> ":%s//\=@o/gce" are interpreted as answers to the y/n/a/q/l/^E/^Y
> prompt.  The back-tick is ignored and the "a" (the subsequent atom)
> is interpreted as "a"ll the remaining matches.
> 
> For this to work (actually prompting the user), the
> ":%s//\=@o/gce" has to be the last item in your mapping, leaving
> the :s command in the user-prompting state.
> 

If this is the cause, it's probably cleaner to wrap everything in a function 
with one command per line, and call the function from the mapping. Then there 
are fewer ways for it to go awry.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-17 Thread Tim Chase
On 2014-11-15 13:59, porphyry5 wrote:
> On Friday, November 14, 2014 4:02:55 PM UTC-8, porphyry5 wrote:
> > In a key mapping I use the command ':%s//\=@o/gce'.
> > 
> > The command executes as expected except that it behaves as if the
> > c flag were not set.  Is this flag unavailable in a key mapping,
> > or is there some other option that needs to be set for it to
> > work.  It works as expected at the command line.
> 
> This is the mapping concerned:
> "map ,, /@@"myWcwqqh"oywxx"nywma:let
> @/=@m:%s//\=@n/ge:let @/=@n:%s//\=@o/gce`ay2h`a:if
> @" != 'qq':norm "Zyw:en

Ah, I believe the problem is triggered because the atoms after the
":%s//\=@o/gce" are interpreted as answers to the y/n/a/q/l/^E/^Y
prompt.  The back-tick is ignored and the "a" (the subsequent atom)
is interpreted as "a"ll the remaining matches.

For this to work (actually prompting the user), the
":%s//\=@o/gce" has to be the last item in your mapping, leaving
the :s command in the user-prompting state.

> The input file it processes has certain words flagged with a
> leading '@@' to indicate a possible error that can only be resolved
> by inspection.  The mapping strips the leading @@ from all
> occurrences of the current word with the first :%s, then runs the
> second :%s with the c flag to allow the user to respond either 'a'
> or 'q' depending on whether the word is actually an error, or
> should be added to the reference word list.

I'm not sure I completely follow your process.  You have words
flagged with "@@" that you need to ask the user about, potentially
adding them to a reference word list (which it looks like you're
storing in the Z register).  Do the "@@" remain at the end of the
process?  I see them getting replaced by "qq" but didn't see them
getting returned to "@@" at any point.  With a sample "before" and
"after" document, along with what/how you want your reference
word-list and the y/n answers you gave, it should be possible to
rewrite this mapping so that it gives you the functionality that you
want.

-tim



-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-15 Thread porphyry5
On Friday, November 14, 2014 4:02:55 PM UTC-8, porphyry5 wrote:
> In a key mapping I use the command ':%s//\=@o/gce'.
> 
> The command executes as expected except that it behaves as if the c flag were 
> not set.  Is this flag unavailable in a key mapping, or is there some other 
> option that needs to be set for it to work.  It works as expected at the 
> command line.
> 
> 
> 
> -- 
> 
> Graham Lawrence

This is the mapping concerned:
"map ,, /@@"myWcwqqh"oywxx"nywma:let @/=@m:%s//\=@n/ge:let 
@/=@n:%s//\=@o/gce`ay2h`a:if @" != 'qq':norm "Zyw:en

The input file it processes has certain words flagged with a leading '@@' to 
indicate a possible error that can only be resolved by inspection.  The mapping 
strips the leading @@ from all occurrences of the current word with the first 
:%s, then runs the second :%s with the c flag to allow the user to respond 
either 'a' or 'q' depending on whether the word is actually an error, or should 
be added to the reference word list.

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: :%s//\=@o/gce ignores c flag in key mapping

2014-11-14 Thread Tim Chase
On 2014-11-14 16:02, Graham Lawrence wrote:
> In a key mapping I use the command ':%s//\=@o/gce'.
> 
> The command executes as expected except that it behaves as if the c
> flag were not set.  Is this flag unavailable in a key mapping, or
> is there some other option that needs to be set for it to work.  It
> works as expected at the command line.

Could you detail the exact mapping you're using?  I tried to
replicate this using

  :nnoremap Q :%s//\=@o/gce
  :let @o='a'
  /the

which primed my search with "the" and my "o" register with the letter
"a" which should have the effect of issuing

  :%s/the/o/gce

and indeed, when I hit "Q" to execute the mapping, it does prompt me
for each instance of "the", allowing me to say yes/no regarding its
replacement with the value of my "o" register.

All that to say:  it's working how you describe it should (and how I
expect it to) and I'm not seeing your "behaves as if the c flag were
not set" symptom.

-tim



 

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_use+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.