Hmm, it should be case-insensitive already, due to the 'gi' parameter.... Punctuation is a bit harder, and you'll probably need to expand the regex to save the possible punctuation spots (after the preceding word, before and after the matched word, before the succeeding word) as separate parameters.

So, I think the regex you want is something like this:

\b(\S+)         # word boundary, sequence of non-whitespace
([,./?;:'")]?      # punctuation, optional, single-char
    [\s\-]*     # whitespace or hyphens (should be expanded to emdashes, 
endashes etc etc etc but I'm too lazy)
    ['"(]?)        # punctuation, optional, single-char
\b(what)\b      # the word itself, with boundaries around it (probably not 
essential, but useful)
([,./?;:'")]?      # you know the drill
    [\s\-]*     # ditto
    ['"(]?)        # see above
(\S+)\b         # succeeding word, with boundary after


But, in string form, it'll be more like this:
matches.push(new RegExp("\\b(\\S+)([,./?;:'\")]?[\\s\\-]*['\"(]?)\\b(" + word + ")\\b([,./?;:'\")]?[\\s\\-]*['\"(]?)(\\S+)\\b", 'gi')); Or you can break it down with constants or whatever (`const trailingPunct = "[,./?;:'\")]?"` etc).

(I used http://jslint.com/ to check the regex after creating it in Regex Coach, http://weitz.de/regex-coach/. This was a pretty complex regex, so I'd definitely recommend using both those tools for editing it.)

Since you're now capturing more matches, you'll need to fix up the `replacer` function to match, which should be straightforward.

On 2011-06-25 20:52, Harahune wrote:
I'm glad my examples were mostly correct, saves me some time. :) Yeah,
the last section of code was ripped right from another script.

Thanks a lot for your help! I've got it mostly working now, though I
noticed an issue where the matches are case sensitive, and being at
the end of a sentence (thus having a period right next to it) also
causes a mismatch. I assume it's in the regex but I'm not sure how to
have the script ignore case and punctuation.

On Jun 25, 8:21 pm, cc<[email protected]>  wrote:
On 2011-06-25 05:17, Harahune wrote:

After chopping up another Greasemonkey script, I've gotten this far.
var words = {
    "what" : "test",
}
Style note: leave off the comma if it's the last in the object
definition -- example:
      var words = {
          "what" : "test",
          "x2": "x3"
      }

var matches=new Array()
var replacements=new Array()
for(var word in words) {
            matches.push(new RegExp("\\b"+word+"\\b", 'gi'));
            replacements.push(words[word]);
}
var texts = document.evaluate(".//text()[normalize-space(.)!
='']",document.body,null,6,null), text="";
Style note: I prefer to use built-in constants for clarity, so the `6`
here would become `XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE`, 
perhttps://developer.mozilla.org/en/XPathResult#Constants

for(var i=0,l=texts.snapshotLength; (this_text=texts.snapshotItem(i));
i++) {
    if(text=this_text.textContent) {
Style note (this is probably from the script you chopped up, but I
wanted to mention it anyway): this `if` is confusing, since it's
actually setting `text`, then testing if it was set to something
non-null. It's also unnecessary, since the XPath expression already
ensures it's non-null. I'd just remove the set statement from the if,
and maybe move it into the for expression.

Uh, actually, nvm, this for statement is just too horribly messed up,
see later for my reconstruction of it.









            for(var x=0,l=matches.length; x<l; x++) {
                    text = text.replace(matches[x],replacements[x]);
                    this_text.textContent = text;
            }
    }
}
This will replace any instance of the word "what" with "test".
However, there's still a few problems. First, I need the variable
"what" in the "words" array to have multiple values, an array within
an array, something like this.
var words = {
    "what" : ["test", "test2", "test3"]
}
Then I need to be able to call it, rather than using
"replacements[x]", with something like words[x][1] to replace it with
"test" or words[x][2] to replace it with "test2". Lastly, I need to be
able to look at the word BEFORE and AFTER the matching word, with
different cases pulling different values of words[x], for one match to
pull words[x][1] and one to pull words[x][2].
Here's where JS has some features that can help you out a good bit.
First off, your redefinition of `var words` will in fact work,
syntax-perfect. (Technically, you're defining an array inside a
hashtable/object -- the curly braces define an object/hashtable, and the
square brackets define an array.) Your indexing is also almost correct,
just remember that JS arrays are 0-indexed, so use
`replacements["what"][0]` for the first replacement possibility ("test").

Adapting the regex to give you words before and after isn't too hard in
this case, since you don't need to worry about overlapping matches (it
would be pretty rare for the same word to show up twice in a row, or
with only one word in between). Probably use something like `new
RegExp("\\b(\\w+)\\s+\\b("+word+")\\b\\s+(\\w+)\\b", 'gi')` -- this
captures whatever word is on each side. Then use something like this:

for(var i=0,l=texts.snapshotLength; i<  l; i++) {
      text = texts.snapshotItem(i).textContent;
      for(var x=0,ml=matches.length; x<  ml; x++) {
          function replacer(str, p1, p2, p3, offset, s) {
              var wordType = //... however you'd determine which
replacement to use
              return p1 + " " + words[p2][wordType] + " " + p3;
          }
          text = text.replace(matches[x], replacer);
          texts.snapshotItem(i).textContent = text;
      }

}

(Readhttps://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/...
for more info -- this is an awesome function, let me tell you.)

--
cc | pseudonymous |<http://carlclark.mp/>

--
cc | pseudonymous |<http://carlclark.mp/>


--
You received this message because you are subscribed to the Google Groups 
"greasemonkey-users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/greasemonkey-users?hl=en.

Reply via email to