Re: :%s//\=@o/gce ignores c flag in key mapping
On Monday, November 24, 2014 4:05:08 AM UTC-8, Erik Christiansen wrote: > I am reminded of two quotes: > > There are two ways of constructing a software design. One way is to make > it so simple that there are obviously no deficiencies. And the other way > is to make it so complicated that there are no obvious deficiencies. >-C.A.R. Hoare > > ... with proper design, the features come cheaply. This approach is > arduous, but continues to succeed.-Dennis Ritchie > > I agree that associative arrays do not seem arduous enough for the > benefits they can bring to the right problem, but then, someone else has > done all the work for us. > > I'd normally test with a known 100% good list, and with a list with a > known number of errors. Those lists do not need to be very long - a few > dozen words ought to suffice. The third test - grinding through a pile > of words, you have already done. > > The exercise is a reminder to us all that the more lines of code we > write, the more bugs creep in undetected. > > Erik > > -- > Sometimes you have to outsmart this stuff, it works for Murphy you know. > - Gene Heskett, on emc-users ML Mr. Hoare has a fine sense of irony! Meanwhile, the awk program passed its "good data" and "known errors" test, and I am grinding on through the "real data", so all seems satisfactory, at least as far as I want to take this. So, once again, I thank you for all your help on this project, much appreciated. -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On 23.11.14 12:01, porphyry5 wrote: > However I smell a rat, its so astonishingly fast, less than 2 seconds > vs ~20 seconds for the previous version, and reports only 1050 total > errors, against more than 3000 total before, though that did include > good words reported as errors. I can't see that it should make any > difference, but I note you did recommend working from the back end > and isolating the suffix first. That's very easily done, so I think > I'll try it as an easy check on the reliability of the result. If it > produces exactly the same errors that would be most encouraging. > Oh, happy day, it produces exactly the same output ... Such testing, both coming and going, with identical results, has to improve confidence in the new implementation. :-) > Now just so long as my enthusiasm for quick and easy answers isn't > blinding me to some lurking gotcha... I am reminded of two quotes: There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -C.A.R. Hoare ... with proper design, the features come cheaply. This approach is arduous, but continues to succeed.-Dennis Ritchie I agree that associative arrays do not seem arduous enough for the benefits they can bring to the right problem, but then, someone else has done all the work for us. I'd normally test with a known 100% good list, and with a list with a known number of errors. Those lists do not need to be very long - a few dozen words ought to suffice. The third test - grinding through a pile of words, you have already done. The exercise is a reminder to us all that the more lines of code we write, the more bugs creep in undetected. Erik -- Sometimes you have to outsmart this stuff, it works for Murphy you know. - Gene Heskett, on emc-users ML -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On Friday, November 21, 2014 1:31:20 AM UTC-8, Erik Christiansen wrote: > On 20.11.14 11:54, porphyry5 wrote: > > Annoyingly, I cannot see any clear way to use an associative array in > > this case, because of that pesky word suffix list. I believe 'if > > (word in wd-list)' must either return "no match" or "exact match". In > > the binary search I check for exact matches, on failure immediately > > followed by a partial match test, 'if (word ~ wd-list[index])' to > > indicate the need to append suffixes to the partial matching wd-list > > entry. > > Ah ... perhaps the easiest way is to detect recognisable suffixes on > input words, and strip them for the initial match attempt, i.e. only the > part of the word you want is hashed. A flag, or non-null > "found_this_suffix" string variable, retained from the partial-match > generating pre-stripping, then guides any additional actions, if > required. > > To cover the case where a word with a recognisable suffix is in the list > with suffix, rather than without, a check-on-match-failure for the > unstripped word could be performed. It would only occur when a suffix is > detected, and would in also be faster than a search. > > That essentially reverses the order of match vs suffix handling. > Speed-wise, I'd expect the pre-strip to be quite a bit faster than the > partial match, since the suffix list is unlikely to number 200,000 > entries. > > Erik > > -- > Britain had first obtained a commercial Enigma machine back in 1927, by simply > purchasing one in the open in Germany. The machine was analysed and a > diagnostic > report written on how it worked.- > http://www.bbc.co.uk/news/magazine-17486464 This is becoming a most interesting project. Checking further in my 2000 odd error file I began discovering certain words that the binary search should have found, but didn't. The cause being that the search did not necessarily hit the root word if it was followed by variants of that root. This redefined the problem to: you have to find the root of the word of interest if you can't find the word itself. To ensure that meant use an associative array, which also allowed an associative array for suffixes. Cuts out many lines of code, the entire identification process is now just function bs() { r="" if (v in b) return r j=1 n=length(v) while (jn) { if (substr(v, j) in s && substr(v, 1, j-1) in b) return r j-- } r="@@" return r } Now just so long as my enthusiasm for quick and easy answers isn't blinding me to some lurking gotcha... -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On 20.11.14 11:54, porphyry5 wrote: > Annoyingly, I cannot see any clear way to use an associative array in > this case, because of that pesky word suffix list. I believe 'if > (word in wd-list)' must either return "no match" or "exact match". In > the binary search I check for exact matches, on failure immediately > followed by a partial match test, 'if (word ~ wd-list[index])' to > indicate the need to append suffixes to the partial matching wd-list > entry. Ah ... perhaps the easiest way is to detect recognisable suffixes on input words, and strip them for the initial match attempt, i.e. only the part of the word you want is hashed. A flag, or non-null "found_this_suffix" string variable, retained from the partial-match generating pre-stripping, then guides any additional actions, if required. To cover the case where a word with a recognisable suffix is in the list with suffix, rather than without, a check-on-match-failure for the unstripped word could be performed. It would only occur when a suffix is detected, and would in also be faster than a search. That essentially reverses the order of match vs suffix handling. Speed-wise, I'd expect the pre-strip to be quite a bit faster than the partial match, since the suffix list is unlikely to number 200,000 entries. Erik -- Britain had first obtained a commercial Enigma machine back in 1927, by simply purchasing one in the open in Germany. The machine was analysed and a diagnostic report written on how it worked.- http://www.bbc.co.uk/news/magazine-17486464 -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On Wednesday, November 19, 2014 8:04:59 PM UTC-8, Erik Christiansen wrote: > On 19.11.14 12:21, porphyry5 wrote: > > So if I read this hash table stuff correctly any data item that one > > wants to look up in an associative array generates its own address in > > memory, by using (a constant number of bytes of?) its value as a > > single binary number and modifying that with some arithmetic algorithm > > to produce an address in the available range. That is a really sweet > > concept, but it must be rather wasteful of memory, lots of unused > > spaces within its range. > > In awk, at least (and probably most later adopters, such as perl), the > memory is malloc-ed, i.e. builds as needed, and array entries can be of > any size within available memory. The fact that an array need not be > dimensioned, that array entries are created when referenced, and can be > deleted, means that the implementation is not actually a simple array. > > Hash buckets, in other areas I've encountered, have usually been > implemented as linked lists, but that is not cast in stone. If there is > only one element at the hashed address, as is needed in encryption > hashes, then there's no list there to walk. I have not examined the code > to check how the hash table is implemented. > > > I use ultra-cheap, under-powered laptops (currently acer c720, new > > laptop for < $200) which has only 2gb memory, but for 200,000 odd > > words that's 10K available per word, lots of available waste space. > > Associative arrays will certainly conserve memory. The cost of this is > that hashing a word takes one iteration of a small loop per character, > before using that to address the hash table. (of pointers to the array > elements) They are still more efficient than most other methods. > > For examples on the use of them, googling for the "Effective AWK > Programming" manual should allow you to download a copy. The book > The AWK Programming Language" by the eponymous authors of the language > is published by Addison Wesley. Its concise presentation of illuminating > examples, and permuted index, facilitate rapid extraction of the > knowledge desired, without faffing around in overly elaborate verbosity. > > Any good reference will warn that "if (word in array) ... " syntax is > needed for testing membership, since "if (array[word] != 0) ..." will > create the element before testing it. (Please don't ask me how I know > that.) > > > I really appreciate you hammering home this point about associative > > arrays, I always assumed, their being so convenient to use, that they > > had to be a very performance-costly luxury. Who knew. > > You are welcome. The language came out in 1977, and I've found it very > useful and quick to code, in three decades in IT. Its C-like syntax > makes it less of a write-only language than some others, I find. > > It is though, interpreted, not compiled, so manually looping over long > lists is slow. Hashing should sort that out. > > > But I will forthwith rewrite the awk program with associative arrays > > and compare the timing for the two methods. Thank you very much for > > the trouble you have gone to on this. > > You are most welcome. All I did was point out the fruit hanging on the > tree. Seeing someone new productively making jam with it is thanks enough. > > Erik > > -- > Unix isn't hard, it's just a lot. (Ascribed to one of its originators) Annoyingly, I cannot see any clear way to use an associative array in this case, because of that pesky word suffix list. I believe 'if (word in wd-list)' must either return "no match" or "exact match". In the binary search I check for exact matches, on failure immediately followed by a partial match test, 'if (word ~ wd-list[index])' to indicate the need to append suffixes to the partial matching wd-list entry. Pity really, but what I have can be lived with, it takes about 20 seconds to process a 700K text. -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On 19.11.14 12:21, porphyry5 wrote: > So if I read this hash table stuff correctly any data item that one > wants to look up in an associative array generates its own address in > memory, by using (a constant number of bytes of?) its value as a > single binary number and modifying that with some arithmetic algorithm > to produce an address in the available range. That is a really sweet > concept, but it must be rather wasteful of memory, lots of unused > spaces within its range. In awk, at least (and probably most later adopters, such as perl), the memory is malloc-ed, i.e. builds as needed, and array entries can be of any size within available memory. The fact that an array need not be dimensioned, that array entries are created when referenced, and can be deleted, means that the implementation is not actually a simple array. Hash buckets, in other areas I've encountered, have usually been implemented as linked lists, but that is not cast in stone. If there is only one element at the hashed address, as is needed in encryption hashes, then there's no list there to walk. I have not examined the code to check how the hash table is implemented. > I use ultra-cheap, under-powered laptops (currently acer c720, new > laptop for < $200) which has only 2gb memory, but for 200,000 odd > words that's 10K available per word, lots of available waste space. Associative arrays will certainly conserve memory. The cost of this is that hashing a word takes one iteration of a small loop per character, before using that to address the hash table. (of pointers to the array elements) They are still more efficient than most other methods. For examples on the use of them, googling for the "Effective AWK Programming" manual should allow you to download a copy. The book The AWK Programming Language" by the eponymous authors of the language is published by Addison Wesley. Its concise presentation of illuminating examples, and permuted index, facilitate rapid extraction of the knowledge desired, without faffing around in overly elaborate verbosity. Any good reference will warn that "if (word in array) ... " syntax is needed for testing membership, since "if (array[word] != 0) ..." will create the element before testing it. (Please don't ask me how I know that.) > I really appreciate you hammering home this point about associative > arrays, I always assumed, their being so convenient to use, that they > had to be a very performance-costly luxury. Who knew. You are welcome. The language came out in 1977, and I've found it very useful and quick to code, in three decades in IT. Its C-like syntax makes it less of a write-only language than some others, I find. It is though, interpreted, not compiled, so manually looping over long lists is slow. Hashing should sort that out. > But I will forthwith rewrite the awk program with associative arrays > and compare the timing for the two methods. Thank you very much for > the trouble you have gone to on this. You are most welcome. All I did was point out the fruit hanging on the tree. Seeing someone new productively making jam with it is thanks enough. Erik -- Unix isn't hard, it's just a lot. (Ascribed to one of its originators) -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On Tuesday, November 18, 2014 11:38:19 PM UTC-8, John Little wrote: > Not much to the point, but I couldn't let this pass: > porphyry5 said: > >... I used a plain numeric index as I figured it must use an address array > >to reference the words array, and with a numeric index I could use a binary > >search pattern to locate the word. I think an associative array must use a > >linear search... > Associative array implementations usually use some kind of hashing, with > mostly constant time look ups. You can tell if you iterate over the keys and > they come in some weird order. > > Regards, John Little Thank you also for the advice to use associative arrays. I've wondered for years what hashing and hash tables actually entailed, but always too lazy to check them out. -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On Wednesday, November 19, 2014 12:10:08 AM UTC-8, Erik Christiansen wrote: > On 18.11.14 15:26, porphyry5 wrote: > > I'm not sure how awk organizes arrays internally, but I used a plain > > numeric index as I figured it must use an address array to reference > > the words array, and with a numeric index I could use a binary search > > pattern to locate the word. > > That is all unnecessary - location is efficiently done for us, _without_ > searching, in an associative array: > > http://en.wikipedia.org/wiki/Associative_array > > > I think an associative array must use a linear search pattern because > > awk has no way of knowing if the array is actually in sequence. > > Erroneous assumption. The wikipedia page has links to "hash table" and > "hash function". A quick glance at them indicates that they should > suffice to explain. An associative array uses a hash to directly vector > to an array element indexed by an arbitrary string. As a consequence, > iteration over the array does not result in an easily anticipated > sequence. Does that matter when the array exists primarily for testing > set membership? > > With the associative array, no iteration over the list is required. The > hashing algorithm rapidly generates the address of a "bucket", > containing from one to just a handful (if the hashing algorithm is poor, > and the array elements exceedingly numerous) of strings with the same > hash. I.e. we go straight to the match. It can be seen as a form of > "content addressable memory", perhaps. > > That is why I previously wrote: > > > and membership tested with an "if (word in list) ..." in an > > > unconditional action handling the input stream". > > If words arrive one per line, then: > >{ if ($1 in my_word_list) ... ; else ... } > > immediately tests membership, without any search loop - explicit or > implicit. To fill the associative array, this should suffice: > > BEGIN { >file = "path/to/my_words" >while ((x = getline < file) > 0) # It does need the extra braces. >{ my_word_list[$1]++ } # If entry > 1, duplicate. >if (x < 0) >{ print "\n\t my_scriptname: " file ": File not found.\n" ; exit } > } > > AND, if an alphabetical sort of the word list is ever required for some > reason, then either presort "path/to/my_words", or pipe it to sort, > either in the shell or within awk, when needed. It is just not > economical to iterate (even in a binary search) over 200,000 words, for > _every_ input word, just to have a sorted word list, maybe once or twice > a year, if ever. > > > And of course, I have a second array of word suffixes to reference if > > the word of interest is not the root form. > > Now I'm curious - how are the suffixes matched to those of the 200,000 > words which may take them? I have a Danish word list which simply > includes each word variant explicitly. If an explicit list takes you to > a million words, then you just need a good hashing algorithm and a > million buckets, _if_ a short search through two or three words is to be > avoided. But a short search in a small bucket would have to be faster > than figuring out which suffixes go with what. Such matching would need > each list word to be identified as noun, adjective, or verb, at a > minimum, to permit correct attachment of adverb suffixes, wouldn't it? > > You seem to have a quite interesting task to deal with. > > Erik > > -- > mfox: You can't have infinite growth in a finite world, over population will > doom > us all in the end because capitalism depends on infinite growth. We are just > rearranging deck chairs on the titanic. > Maxx: Totally agree mfox, I think someone put Norman Lindsay's "The Magic > Pudding" in the non-fiction section. > - > http://www.abc.net.au/news/2014-10-16/kohler-when-a-central-banker-talks-like-this-pay-attention/5815392 So if I read this hash table stuff correctly any data item that one wants to look up in an associative array generates its own address in memory, by using (a constant number of bytes of?) its value as a single binary number and modifying that with some arithmetic algorithm to produce an address in the available range. That is a really sweet concept, but it must be rather wasteful of memory, lots of unused spaces within its range. I use ultra-cheap, under-powered laptops (currently acer c720, new laptop for < $200) which has only 2gb memory, but for 200,000 odd words that's 10K available per word, lots of available waste space. I really appreciate you hammering home this point about associative arrays, I always assumed, their being so convenient to use, that they had to be a very performance-costly luxury. Who knew. But I will forthwith rewrite the awk program with associative arrays and compare the timing for the two methods. Thank you very much for the trouble you have gone to on this. -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text y
Re: :%s//\=@o/gce ignores c flag in key mapping
On 18.11.14 15:26, porphyry5 wrote: > I'm not sure how awk organizes arrays internally, but I used a plain > numeric index as I figured it must use an address array to reference > the words array, and with a numeric index I could use a binary search > pattern to locate the word. That is all unnecessary - location is efficiently done for us, _without_ searching, in an associative array: http://en.wikipedia.org/wiki/Associative_array > I think an associative array must use a linear search pattern because > awk has no way of knowing if the array is actually in sequence. Erroneous assumption. The wikipedia page has links to "hash table" and "hash function". A quick glance at them indicates that they should suffice to explain. An associative array uses a hash to directly vector to an array element indexed by an arbitrary string. As a consequence, iteration over the array does not result in an easily anticipated sequence. Does that matter when the array exists primarily for testing set membership? With the associative array, no iteration over the list is required. The hashing algorithm rapidly generates the address of a "bucket", containing from one to just a handful (if the hashing algorithm is poor, and the array elements exceedingly numerous) of strings with the same hash. I.e. we go straight to the match. It can be seen as a form of "content addressable memory", perhaps. That is why I previously wrote: > > and membership tested with an "if (word in list) ..." in an > > unconditional action handling the input stream". If words arrive one per line, then: { if ($1 in my_word_list) ... ; else ... } immediately tests membership, without any search loop - explicit or implicit. To fill the associative array, this should suffice: BEGIN { file = "path/to/my_words" while ((x = getline < file) > 0) # It does need the extra braces. { my_word_list[$1]++ } # If entry > 1, duplicate. if (x < 0) { print "\n\t my_scriptname: " file ": File not found.\n" ; exit } } AND, if an alphabetical sort of the word list is ever required for some reason, then either presort "path/to/my_words", or pipe it to sort, either in the shell or within awk, when needed. It is just not economical to iterate (even in a binary search) over 200,000 words, for _every_ input word, just to have a sorted word list, maybe once or twice a year, if ever. > And of course, I have a second array of word suffixes to reference if > the word of interest is not the root form. Now I'm curious - how are the suffixes matched to those of the 200,000 words which may take them? I have a Danish word list which simply includes each word variant explicitly. If an explicit list takes you to a million words, then you just need a good hashing algorithm and a million buckets, _if_ a short search through two or three words is to be avoided. But a short search in a small bucket would have to be faster than figuring out which suffixes go with what. Such matching would need each list word to be identified as noun, adjective, or verb, at a minimum, to permit correct attachment of adverb suffixes, wouldn't it? You seem to have a quite interesting task to deal with. Erik -- mfox: You can't have infinite growth in a finite world, over population will doom us all in the end because capitalism depends on infinite growth. We are just rearranging deck chairs on the titanic. Maxx: Totally agree mfox, I think someone put Norman Lindsay's "The Magic Pudding" in the non-fiction section. - http://www.abc.net.au/news/2014-10-16/kohler-when-a-central-banker-talks-like-this-pay-attention/5815392 -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
Not much to the point, but I couldn't let this pass: porphyry5 said: >... I used a plain numeric index as I figured it must use an address array to >reference the words array, and with a numeric index I could use a binary >search pattern to locate the word. I think an associative array must use a >linear search... Associative array implementations usually use some kind of hashing, with mostly constant time look ups. You can tell if you iterate over the keys and they come in some weird order. Regards, John Little -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On Tuesday, November 18, 2014 2:37:22 AM UTC-8, Erik Christiansen wrote: > On 17.11.14 10:57, Graham Lawrence wrote: > > For my test file the awk program tagged some 3500 words, with 1960 of them > > unique, so this vim script must run within a loop to avoid the tedium and > > 4000 odd keystrokes required to invoke it individually for each unique > > error, > > Er, what script loop, and what "4000 odd keystrokes [per] error", if one > may be so bold? A while loop to enclose the mapping as you saw it, I never add such details until I have the rest of the code working satisfactorily. Not 4000 keystrokes per error, ~4000 for all 1960 uniques errors with a 2 keystroke code to invoke the mapping. All of which is redundant now, as I realized I could cut it to one keystroke per error by splitting it into 2 mappings, which allowed eliminating the need for user input entirely, in which the 2nd mapping ends by reinvoking the first. > If the list of good words is read into an associative > array (lets call it "list") in the BEGIN action, and membership tested > with an "if (word in list) ..." in an unconditional action handling the > input stream, _and_ the unrecognised words (sans @@) are printed to > another file, then it is only necessary to open that file in vim, and > for each word (one per line), hit ":.w >> /path/goodfile" for each word > which we accept as good. With that aliased to a key of choice, only one > keystroke is required to qualify each word. Both awk and vim are run > once per session, handling thousands of words each time, if you have > them. Four thousand keystrokes would handle 4000 errors. In practice, it is not that straightforward. I'm not sure how awk organizes arrays internally, but I used a plain numeric index as I figured it must use an address array to reference the words array, and with a numeric index I could use a binary search pattern to locate the word. I think an associative array must use a linear search pattern because awk has no way of knowing if the array is actually in sequence. And of course, I have a second array of word suffixes to reference if the word of interest is not the root form. > > If these are e.g. ordinary English words, is it acceptable to read in > e.g. /usr/share/dict/british-english into "list", to start with 98,000 > or more good words in the BEGIN action, before reading in your list of > special words, Project Gutenberg provides Webster's Dictionary from about 1913. I extracted all the words from the html, and it reduced to about 200,000 unique words. I use arch and they don't include such refinements as dictionaries in their distro. > > Erik > (Who is doubtless glossing over some undeclared additional requirement. :) > > -- > Melbourne Water Use: >"More water is lost to stormwater each year than we use. On average we use > about 40 billion litres of water each year, and each year about 500 billion > litres runs into our drains." Leonie Duncan, Environment Victoria healthy > river > campaigner, quoted on p7 of Journal 21.10.08. -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On 17.11.14 10:57, Graham Lawrence wrote: > For my test file the awk program tagged some 3500 words, with 1960 of them > unique, so this vim script must run within a loop to avoid the tedium and > 4000 odd keystrokes required to invoke it individually for each unique > error, Er, what script loop, and what "4000 odd keystrokes [per] error", if one may be so bold? If the list of good words is read into an associative array (lets call it "list") in the BEGIN action, and membership tested with an "if (word in list) ..." in an unconditional action handling the input stream, _and_ the unrecognised words (sans @@) are printed to another file, then it is only necessary to open that file in vim, and for each word (one per line), hit ":.w >> /path/goodfile" for each word which we accept as good. With that aliased to a key of choice, only one keystroke is required to qualify each word. Both awk and vim are run once per session, handling thousands of words each time, if you have them. Four thousand keystrokes would handle 4000 errors. If these are e.g. ordinary English words, is it acceptable to read in e.g. /usr/share/dict/british-english into "list", to start with 98,000 or more good words in the BEGIN action, before reading in your list of special words, Erik (Who is doubtless glossing over some undeclared additional requirement. :) -- Melbourne Water Use: "More water is lost to stormwater each year than we use. On average we use about 40 billion litres of water each year, and each year about 500 billion litres runs into our drains." Leonie Duncan, Environment Victoria healthy river campaigner, quoted on p7 of Journal 21.10.08. -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On Monday, November 17, 2014 12:57:31 PM UTC-6, porphyry5 wrote: > > Is there a method of getting the screen to constantly show the cursor as a > .vim script progresses? > > It should be doing it anyway, but the :redraw or :redraw! command can often force it. -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
I thank you both, Tim and Ben, for your help. Let me explain the situation more fully. I'm testing the feasibility of semi-automated repair of texts that have been ocr-ed from less than ideal sources, most notably from books.google.com. To that end I've written two scripts; the first in sed to detect and impose formatting structure on the text from cues within the text itself; the second in awk to refer every word in the text to a word-list, and to prepend every word not in the list with @@. At this point the text must be inspected visually to decide if each @@word is an error or not. If not, the @@ is removed and the actual word is saved to be added to the awk script's word list. If an error, the @@ is changed to qq (for later manual correction), so that errors that have already been processed will not be reprocessed by this vim script. For my test file the awk program tagged some 3500 words, with 1960 of them unique, so this vim script must run within a loop to avoid the tedium and 4000 odd keystrokes required to invoke it individually for each unique error, which I assume invalidates the method I'm trying here, Tim, because then the :s...gce command cannot be forced to be last in the script, it must be followed by 'endw'. That is not a problem as the same effect can be achieved using the input() function, and I have written such two ways, as a key-mapping and as a .vim script. Both of these will do the job, but neither is actually useful for visual reasons. It is essential that the user be able to read the text about the current @@word, to determine whether or not it is an error. But when the key-mapping stops at the input() prompt vim lists above it all preceding commands in the while loop, which often has the effect of pushing the sentence of interest off the top of the screen. How can I prevent that behavior? The .vim approach is even worse as with that the screen does not update at all as the script works its way throuhg the text. I cannot find any function in :h function-list that will actually cause the screen to show the text surrounding the current cursor position, and including say 'norm zz' immediately before input() has no effect at all. Is there a method of getting the screen to constantly show the cursor as a .vim script progresses? On Mon, Nov 17, 2014 at 9:01 AM, Ben Fritz wrote: > On Monday, November 17, 2014 5:27:47 AM UTC-6, Tim Chase wrote: > > On 2014-11-15 13:59, porphyry5 wrote: > > > On Friday, November 14, 2014 4:02:55 PM UTC-8, porphyry5 wrote: > > > > In a key mapping I use the command ':%s//\=@o/gce'. > > > > > > > > The command executes as expected except that it behaves as if the > > > > c flag were not set. Is this flag unavailable in a key mapping, > > > > or is there some other option that needs to be set for it to > > > > work. It works as expected at the command line. > > > > > > This is the mapping concerned: > > > "map ,, /@@"myWcwqqh"oywxx"nywma:let > > > @/=@m:%s//\=@n/ge:let @/=@n:%s//\=@o/gce`ay2h`a:if > > > @" != 'qq':norm "Zyw:en > > > > Ah, I believe the problem is triggered because the atoms after the > > ":%s//\=@o/gce" are interpreted as answers to the y/n/a/q/l/^E/^Y > > prompt. The back-tick is ignored and the "a" (the subsequent atom) > > is interpreted as "a"ll the remaining matches. > > > > For this to work (actually prompting the user), the > > ":%s//\=@o/gce" has to be the last item in your mapping, leaving > > the :s command in the user-prompting state. > > > > If this is the cause, it's probably cleaner to wrap everything in a > function with one command per line, and call the function from the mapping. > Then there are fewer ways for it to go awry. -- Graham Lawrence -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On Monday, November 17, 2014 5:27:47 AM UTC-6, Tim Chase wrote: > On 2014-11-15 13:59, porphyry5 wrote: > > On Friday, November 14, 2014 4:02:55 PM UTC-8, porphyry5 wrote: > > > In a key mapping I use the command ':%s//\=@o/gce'. > > > > > > The command executes as expected except that it behaves as if the > > > c flag were not set. Is this flag unavailable in a key mapping, > > > or is there some other option that needs to be set for it to > > > work. It works as expected at the command line. > > > > This is the mapping concerned: > > "map ,, /@@"myWcwqqh"oywxx"nywma:let > > @/=@m:%s//\=@n/ge:let @/=@n:%s//\=@o/gce`ay2h`a:if > > @" != 'qq':norm "Zyw:en > > Ah, I believe the problem is triggered because the atoms after the > ":%s//\=@o/gce" are interpreted as answers to the y/n/a/q/l/^E/^Y > prompt. The back-tick is ignored and the "a" (the subsequent atom) > is interpreted as "a"ll the remaining matches. > > For this to work (actually prompting the user), the > ":%s//\=@o/gce" has to be the last item in your mapping, leaving > the :s command in the user-prompting state. > If this is the cause, it's probably cleaner to wrap everything in a function with one command per line, and call the function from the mapping. Then there are fewer ways for it to go awry. -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On 2014-11-15 13:59, porphyry5 wrote: > On Friday, November 14, 2014 4:02:55 PM UTC-8, porphyry5 wrote: > > In a key mapping I use the command ':%s//\=@o/gce'. > > > > The command executes as expected except that it behaves as if the > > c flag were not set. Is this flag unavailable in a key mapping, > > or is there some other option that needs to be set for it to > > work. It works as expected at the command line. > > This is the mapping concerned: > "map ,, /@@"myWcwqqh"oywxx"nywma:let > @/=@m:%s//\=@n/ge:let @/=@n:%s//\=@o/gce`ay2h`a:if > @" != 'qq':norm "Zyw:en Ah, I believe the problem is triggered because the atoms after the ":%s//\=@o/gce" are interpreted as answers to the y/n/a/q/l/^E/^Y prompt. The back-tick is ignored and the "a" (the subsequent atom) is interpreted as "a"ll the remaining matches. For this to work (actually prompting the user), the ":%s//\=@o/gce" has to be the last item in your mapping, leaving the :s command in the user-prompting state. > The input file it processes has certain words flagged with a > leading '@@' to indicate a possible error that can only be resolved > by inspection. The mapping strips the leading @@ from all > occurrences of the current word with the first :%s, then runs the > second :%s with the c flag to allow the user to respond either 'a' > or 'q' depending on whether the word is actually an error, or > should be added to the reference word list. I'm not sure I completely follow your process. You have words flagged with "@@" that you need to ask the user about, potentially adding them to a reference word list (which it looks like you're storing in the Z register). Do the "@@" remain at the end of the process? I see them getting replaced by "qq" but didn't see them getting returned to "@@" at any point. With a sample "before" and "after" document, along with what/how you want your reference word-list and the y/n answers you gave, it should be possible to rewrite this mapping so that it gives you the functionality that you want. -tim -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On Friday, November 14, 2014 4:02:55 PM UTC-8, porphyry5 wrote: > In a key mapping I use the command ':%s//\=@o/gce'. > > The command executes as expected except that it behaves as if the c flag were > not set. Is this flag unavailable in a key mapping, or is there some other > option that needs to be set for it to work. It works as expected at the > command line. > > > > -- > > Graham Lawrence This is the mapping concerned: "map ,, /@@"myWcwqqh"oywxx"nywma:let @/=@m:%s//\=@n/ge:let @/=@n:%s//\=@o/gce`ay2h`a:if @" != 'qq':norm "Zyw:en The input file it processes has certain words flagged with a leading '@@' to indicate a possible error that can only be resolved by inspection. The mapping strips the leading @@ from all occurrences of the current word with the first :%s, then runs the second :%s with the c flag to allow the user to respond either 'a' or 'q' depending on whether the word is actually an error, or should be added to the reference word list. -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: :%s//\=@o/gce ignores c flag in key mapping
On 2014-11-14 16:02, Graham Lawrence wrote: > In a key mapping I use the command ':%s//\=@o/gce'. > > The command executes as expected except that it behaves as if the c > flag were not set. Is this flag unavailable in a key mapping, or > is there some other option that needs to be set for it to work. It > works as expected at the command line. Could you detail the exact mapping you're using? I tried to replicate this using :nnoremap Q :%s//\=@o/gce :let @o='a' /the which primed my search with "the" and my "o" register with the letter "a" which should have the effect of issuing :%s/the/o/gce and indeed, when I hit "Q" to execute the mapping, it does prompt me for each instance of "the", allowing me to say yes/no regarding its replacement with the value of my "o" register. All that to say: it's working how you describe it should (and how I expect it to) and I'm not seeing your "behaves as if the c flag were not set" symptom. -tim -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.