John Machin wrote:
On Feb 8, 1:37 am, MRAB <goo...@mrabarnett.plus.com> wrote:
LaundroMat wrote:
Hi,
I'm quite new to regular expressions, and I wonder if anyone here
could help me out.
I'm looking to split strings that ideally look like this: "Update: New
item (Household)" into a group.
This expression works ok: '^(Update:)?(.*)(\(.*\))$' - it returns
("Update", "New item", "(Household)")
Some strings will look like this however: "Update: New item (item)
(Household)". The expression above still does its job, as it returns
("Update", "New item (item)", "(Household)").

Not quite true; it actually returns
    ('Update:', ' New item (item) ', '(Household)')
However ignoring the difference in whitespace, the OP's intention is
clear. Yours returns
    ('Update:', ' New item ', '(item) (Household)')

The OP said it works OK, which I took to mean that the OP was OK with
the extra whitespace, which can be easily stripped off. Close enough!

It does not work however when there is no text in parentheses (eg
"Update: new item"). How can I get the expression to return a tuple
such as ("Update:", "new item", None)?
You need to make the last group optional and also make the middle group
lazy: r'^(Update:)?(.*?)(?:(\(.*\)))?$'.

Why do you perpetuate the redundant ^ anchor?

The OP didn't say whether search() or match() was being used. With the ^
it doesn't matter.

(?:...) is the non-capturing version of (...).

Why do you use
    (?:(subpattern))?
instead of just plain
    (subpattern)?
?

Oops, you're right. I was distracted by the \( and \)! :-)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to