READERS DIGEST CONDENSED QUESTION: How expensive is eval("123 + 456 == 975")
versus other ways?
The discussion started by Jach has spread away from discussing how python deals
with numbers starting with leading zeroes such as "03". I note there are many
ID numbers like social security that have a leading zero so if converted to an
int for storage reasons, ...
The TOPIC I want to discuss is the concerns about Jach wanting to use "eval()".
Besides security concerns, I want to know if it is particularly expensive.
It was suggested that you might solve a range of problems like puzzles asking
you to substitute unique digits for letters. Some of these puzzles are
normally solved with logic a step at a time but the techniques used may require
a specific order and may not be easy to program for a more general case so can
it be solved using what might be termed brute force. I mean try EVERYTHING that
might work, including some that might not.
A particular example was: SEND + MORE = MONEY.
Jach's suggestion was to take every possible combination of digits and make the
substitutions for the letters in that puzzle: then do something like this:
>>> eval ("9567+1085==10652")
True
>>> eval("123+456==975")
False
So, although better algorithms exist, no doubt, I considered what it would take
to do this without an eval. I came up with the following as an outline.
Accept the puzzle as a string with any number of items to the right or left of
an "=" as long as they only have whitespace and plus signs in addition to
alphabetic characters. So "AE + DF = CE + AB + AA" would be fine. This model
does not include doing subtraction or other things, just a simple model like
that, for now.
You can easily remove whitespace, force everything to one case, split it into a
left/right on "=" and then split those into lists on "+.
You now have two lists of alphabetic strings that ultimately must become lists
of integers and then the sums can be compared.
To get the number of unique letters in the puzzle, N, you throw all the list
items as individual characters into a set. Clearly, for this kind of puzzle,
there cannot be more than ten unique letters or there can be no unique mapping
for 0-9.
I would then "generate" all possible combinations of digits 0-9 of length N.
There are an amazing number of ways to do that ranging from taking a
range(10**N) and converting it to a string then a list of numeral characters
then tossing out any that have any duplicates, to creating a recursive function
that passes along what has been used so far to the next call. Since many of
these problems only need ONE solution, the generator would only iterate as many
times as needed and no giant lists of numbers or strings need be in memory at
any time.
For each tuple returned by the generator, you can iterate on the set of unique
letters and use string functions to substitute the digits, or perhaps do this
all at once. You would do this to all the items in what I call the left list as
well as all the items on the right list. These would not be "numeric" so using
int() on each item would work EVEN with leading zeroes. Seems safe enough.
Finally, with no eval needed, you take the sum of the numbers in a list of the
left and compare to the sum on the right and if equal, you present the solution
in whatever way works. If no more solutions are needed, you quit. I might write
this part as a generator too, that can be called once or to completion.
Yes, this is brute force. Using range(1,000,000) (no commas in real use) would
be a million iterations when there are six unique characters in the puzzle and
as many as 10 billion if all ten are in use. If you use nested loops like I
showed a while ago (albeit to make arbitrary ones for many sizes could require
multiple functions or use of an eval on one built by hand) you can cut down the
number of iterations as the nested loops count down with each one doing one
less than the one before it. Same goes for the recursive function call method
as it passes along what numerals have already been used. There may already be
a permutation function that does this efficiently in C.
The above description includes parts that may also be used in the eval
situation. The main difference may be that my method uses int() and perhaps
some string functions. So, does eval just divert the attention of the
interpreter as may happen with importing a module, or does it invoke a second
copy of the interpreter as may happen with some multitasking methods? I would
guess the former. But it may also have to first compile the text into byte
code. However, a full eval is like using a sledgehammer when a thumbtack might
be enough. Then again, it is already sitting there so a little extra use might
be cheap.
So a main reason for my variant might still be to avoid taking any chances on
rogue code. Mind you, in the above I would remove everything except [a-zA-Z=+]
including parentheses and periods so damage from a carefully crafted text
should be limited to over-riding a global variable name and even that can be
handled by supplying a minimal environment to the eval call.
The real cost that dominates here is not memory, albeit garbage collection may
be busy as it generates lots of temporary small bits of data. It is how the
number of iterations grows.
I have looked at a somewhat related issue of how you generate all possible
SCRABBLE words (or BOGGLE or ...) given specific starting letters. That problem
allows duplicate letters but it has to handle quite large words (even weird
medical terms like Pneumonoultramicroscopicsilicovolcanoconiosis but SCRABBLE
words cannot possible be larger than the seven letters you have added to
whatever is on the board that it connects to and in any case, the board can
only fit 19 letters.) One way to make all possible combinations is along the
lines above with many needed changes as there can (in theory) be as many as 26
unique letters (in English) and you can have multiple copies of some letters.
If you allow other languages, the problem grows to the point where brute force
is not realistic. And, ideally, you winnow down your choices by checking each
word against a large dictionary. Anyone know of one that has a decent selection
and can be freely imported? I mean one word per line.
Of course, in this case, there is room for improvement such as by making a tree
out of the dictionary and pruning any attempts if the word you are building
wanders off a leaf. Another story.
I apologize for the length. The main question was whether eval is particularly
expensive.
-----Original Message-----
From: Python-list <[email protected]> On
Behalf Of Antoon Pardon
Sent: Monday, December 10, 2018 5:00 AM
To: [email protected]
Subject: Re: Why Python don't accept 03 as a number?
On 8/12/18 06:00, Cameron Simpson wrote:
> On 07Dec2018 20:24, Jach Fong <[email protected]> wrote:
>> Ian at 2018/12/8 UTC+8 AM11:28:34 wrote:
>>> What is it exactly that you're trying to accomplish with this?
>>> Perhaps there's a better way than using eval.
>>
>> This problem comes from solving a word puzzle,
>> ab + aa + cd == ce
>> Each character will be translate to a digit and evaluate the
>> correctness,
>> 03 + 00 + 15 == 18
>
> Then you should be evaluating the digits and assembling values from
> them. Not trying to shoehorn a string through something that _might_
> accept this string and do what you want. In Python 2 it will accept
> your string and not do what you want; at least in Python 3 it doesn't
> accept your string.
>
> My point here is that the structure of your puzzle doesn't map
> directly into a naive python statement, and you shouldn't be
> pretending it might.
How do you figure? As far as I understand he is trying to solve this kind of
puzzle:
SEND
MORE
+ ————
MONEY
Where every letter is to be replaced by a digit in such a way that the sum
checks out. Sure trying to solve this by starting with the string: "SEND + MORE
== MONEY" and doing replaces until eval comes up with true is not very
sofisticated but I wouldn't call it shoehorning either.
--
Antoon.
--
https://mail.python.org/mailman/listinfo/python-list
--
https://mail.python.org/mailman/listinfo/python-list