Re: Quickie - Regexp for a string not at the beginning of the line

2012-10-26 Thread Ed Morton

On 10/25/2012 11:45 PM, Rivka Miller wrote:

Thanks everyone, esp this gentleman.

The solution that worked best for me is just to use a DOT before the
string as the one at the beginning of the line did not have any char
before it.


That's fine but do you understand that that is not an RE that matches on 
$hello$ not at the start of a line, it's an RE that matches on any 
char$hello$ anywhere in the line? There's a difference - if you use a tool 
that prints the text that matches an RE then the output if the first RE existed 
would be $hello$ while the output for the second RE would be X$hello$ or 
Y$hello$ or


In some tools you can use /(.)$hello$/ or similar to ignore the first part of 
the RE (.) and just print the second $hello, but that ability and it's 
syntax is tool-specific, you still can't say here's an RE that does this, 
you've got to say here's how to find this text using tool whatever.


   Ed.


I guess, this requires the ability to ignore the CARAT as the beginning of the 
line.

I am a satisfied custormer. No need for returns. :)

On Oct 25, 7:11 pm, Ben Bacarisse ben.use...@bsb.me.uk wrote:

Rivka Miller rivkaumil...@gmail.com writes:

On Oct 25, 2:27 pm, Danny dann90...@gmail.com wrote:

Why you just don't give us the string/input, say a line or two, and
what you want off of it, so we can tell better what to suggest



no one has really helped yet.


Really?  I was going to reply but then I saw Janis had given you the
answer.  If it's not the answer, you should just reply saying what it is
that's wrong with it.


I want to search and modify.


Ah.  That was missing from the original post.  You can't expect people
to help with questions that weren't asked!  To replace you will usually
have to capture the single preceding character.  E.g. in sed:

   sed -e 's/\(.\)$hello\$/\1XXX/'

but some RE engines (Perl's, for example) allow you specify zero-width
assertions.  You could, in Perl, write

   s/(?=.)\$hello\$/XXX/

without having to capture whatever preceded the target string.  But
since Perl also has negative zero-width look-behind you can code your
request even more directly:

   s/(?!^)\$hello\$/XXX/


I dont wanna be tied to a specific language etc so I just want a
regexp and as many versions as possible. Maybe I should try in emacs
and so I am now posting to emacs groups also, although javascript has
rich set of regexp facilities.


You can't always have a universal solution because different PE
implementations have different syntax and semantics, but you should be
able to translate Janis's solution of matching *something* before your
target into every RE implementation around.


examples



$hello$ should not be selected but
not hello but all of the $hello$ and $hello$ ... $hello$ each one
selected


I have taken your $s to be literal.  That's not 100 obvious since $ is a
common (universal?) RE meta-character.

snip
--
Ben.




--
http://mail.python.org/mailman/listinfo/python-list


Re: Quickie - Regexp for a string not at the beginning of the line

2012-10-25 Thread Ed Morton

On 10/25/2012 8:08 PM, Rivka Miller wrote:

On Oct 25, 2:27 pm, Danny dann90...@gmail.com wrote:

Why you just don't give us the string/input, say a line or two, and what you 
want off of it, so we can tell better what to suggest


no one has really helped yet.


Because there is no solution - there IS no _RE_ that will match a string not at 
the beginning of a line.


Now if you want to know how to extract a string that matches an RE in awk, 
that'd be (just one way):


   awk 'match($0,/.[$]hello[$]/) { print substr($0,RSTART+1,RLENGTH-1) }'

and other tools would have their ways of producing the same output, but that's 
not the question you're asking.


Ed.


I want to search and modify.

I dont wanna be tied to a specific language etc so I just want a
regexp and as many versions as possible. Maybe I should try in emacs
and so I am now posting to emacs groups also, although javascript has
rich set of regexp facilities.

examples

$hello$ should not be selected but
not hello but all of the $hello$ and $hello$ ... $hello$ each one
selected

=
original post
=


Hello Programmers,

I am looking for a regexp for a string not at the beginning of the
line.

For example, I want to find $hello$ that does not occur at the
beginning of the string, ie all $hello$ that exclude ^$hello$.

In addition, if you have a more difficult problem along the same
lines, I would appreciate it. For a single character, eg  not at the
beginning of the line, it is easier, ie

^[^]+

but I cant use the same method for more than one character string as
permutation is present and probably for more than one occurrence,
greedy or non-greedy version of [^]+ would pick first or last but not
the middle ones, unless I break the line as I go and use the non-
greedy version of +. I do have the non-greedy version available, but
what if I didnt?

If you cannot solve the problem completely, just give me a quick
solution with the first non beginning of the line and I will go from
there as I need it in a hurry.

Thanks



--
http://mail.python.org/mailman/listinfo/python-list


Re: sed/awk/perl: How to replace all spaces each with an underscore that occur before a specific string ?

2009-08-22 Thread Ed Morton
On Aug 22, 1:11 pm, bolega gnuist...@gmail.com wrote:
 sed/awk/perl:

 How to replace all spaces each with an underscore that occur before a
 specific string ?

 I really prefer a sed one liner.

Why?

 Example
 Input :  This is my book. It is too  thick to read. The author gets
 little royalty but the publisher makes a lot.
 Output: This_is_my_book._It_is_too__thick_to read. The author gets
 little royalty but the publisher makes a lot.

 We replaced all the spaces with underscores before the first occurence
 of the string to .

No, you replaced all ... the string to  (note the space).

awk '{idx=index($0,to ); tgt=substr($0,1,idx-1); gsub(/ /,_,tgt);
print tgt substr($0,idx)}' file

   Ed.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing 2 similar strings?

2005-05-18 Thread Ed Morton


William Park wrote:

 How do you compare 2 strings, and determine how much they are close to
 each other?  Eg.
 aqwerty
 qwertyb
 are similar to each other, except for first/last char.  But, how do I
 quantify that?
 
 I guess you can say for the above 2 strings that
 - at max, 6 chars out of 7 are same sequence -- 85% max
 
 But, for
 qawerty
 qwerbty
 max correlation is
 - 3 chars out of 7 are the same sequence -- 42% max
 
 (Crossposted to 3 of my favourite newsgroup.)


However you like is probably the right answer, but one way might be to 
compare their soundex encoding 
(http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?soundex) and figure out 
percentage difference based on comparing the numeric part.

Ed.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Comparing 2 similar strings?

2005-05-18 Thread Ed Morton


John Machin wrote:

 On Wed, 18 May 2005 20:03:53 -0500, Ed Morton [EMAIL PROTECTED]
 wrote:
snip
I assume you were actually being facetious
and trying to make the point 
that names that don't look the same on paper can have the same soundex 
encoding and that's obviously countered with the fact that soundex is 
just a cheap and cheerful way to find names that probably sound similair 
which can vary tremendously based on ethnicity or accent.
 
 
 *If* you want phonetic similarity, there are methods that much better
 than soundex, in the sense of fewer false positives and fewer false
 negatives. Google for NYSIIS, dolby, metaphone, caverphone.

And I assume I'd find they all have pros and cons too, otherwise you'd 
be referring to THE best one rather than a selection. It seems a bit 
pointless to go browsing through the documentation on them when someone 
who presumably already has can't just state the best one for the job.

 Cheap? You get what you pay for.
 
 Cheerful? What's the relevance?

Cheap and cheerful is a colloquial expression meaning cost-effective.

 Someone who types Mousaferiadis into a customer search screen and
 gets back several lines of McPherson and MacPherson is unlikely to be
 cheerful -- even before we factor in the speed [soundex divides the
 universe into a relative small number of buckets].
 
 Someone who's looking for Erin when they should be looking for Aaron
 (or vice versa) won't get much cheer out of soundex, either.

That goes back to accent. In [some parts at least of] the USA Erin 
sounds very much like Aaron wheras in the UK the 2 are very dissimilar. 
I assume since you apparently consider them similair that you live in 
the USA and so would consider soundex as providing a false negative by 
saying they don't match. Perhaps one of the other approaches you suggest 
would report that they do match but that wouldn't make it clearly a 
better choice to everyone.

 
It's a reasonable approach to consider given the very loose requirements 
presented.
 
 
 Soundex is *NEVER* a reasonable approach to consider. Phonetic
 variation is only one consideration. In any case, the OP didn't appear
 to be concerned with phonetic variations.

The OP didn't say what the application was at all, but you're right that 
from his example he does SEEM more interested in character matches than 
phonetic ones so he'd presumably quickly discard phonetic comparisons if 
that's really not what he wants.

Ed.
-- 
http://mail.python.org/mailman/listinfo/python-list