[Python-ideas] Re: An itertools.interleave / itertools.join function

Christopher Barker Fri, 11 Dec 2020 00:32:55 -0800

On Thu, Dec 10, 2020 at 8:03 PM Chris Angelico <ros...@gmail.com> wrote:


> > Here is an example of how I use it to build an arbitrary long SQL
> request without having to pay for long intermediate strings, both in
> computation on memory.
>
<snip>

> > I often had such cases, but ended up using the more costy str.join .
>
> Is it really more costly? With strings the size of SQL queries
> (keeping in mind that these strings (correctly) contain no actual
> data, just placeholders), I doubt you'll see any significant
> performance hit from this.


I had the same thought -- expecting that it would take some pretty darn big
intermediate strings to be any faster at all.  I was not quite right. If
you replace the "join" iterator with plain str.join(), it does run a touch
slower:
def join_iterable(table, columns, values):
    request = ''.join(chain(
        ('INSERT INTO ', table, '('),
        join(', ', columns),
        (') VALUES (',),
        chain.from_iterable(join(('), (',), (join(', ', ('%s' for v in
value)) for value in values))),
        (') ON DUPLICATE KEY UPDATE ',),
        chain.from_iterable(join((', '), ((c, '=VALUES(', c, ')') for c in
columns))),
    ))

    return request


def join_str(table, columns, values):
    request = ''.join(chain(
        ('INSERT INTO ', table, '('),
        ', '.join(columns),
        (') VALUES (',),
        chain.from_iterable('), ('.join(', '.join('%s' for v in value) for
value in values)),
        (') ON DUPLICATE KEY UPDATE ',),
        chain.from_iterable(', '.join(f"{c}=VALUES({c})" for c in columns)),
    ))
    return request

In [31]: %timeit join_iterable(table, columns, values)
8.65 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [32]: %timeit join_str(table, columns, values)
13 µs ± 38.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

But if you get rid of the whole pile of nested iterators and chain, and
make it a simple str.join, with each bit using plain str.join(), it's
faster still.

def join_str_simple(table, columns, values):
    request = "".join(["INSERT INTO ",
                       f"{table}({', '.join(columns)})",
                       " VALUES (",
                       "), (".join(', '.join('%s' for v in value) for value
in values),
                       ") ON DUPLICATE KEY UPDATE ",
                       ", ".join(f"{c}=VALUES({c})" for c in columns),
                       ])
    return request

In [33]: %timeit join_str_simple(table, columns, values)
5.09 µs ± 26.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

And I would argue more readable. (though the double comprehensions are not
great ...)

Granted, you may lose that advantage if you had a lot more values -- but I
expect it wouldn't be a real issue until you had hundreds or more.


> Also, I would be VERY surprised if the cost
> of in-memory string manipulation exceeds the cost of an actual
> database transaction.
>

well, yeah -- these are all very fast.


> But more importantly: Is it any more readable? What you have there is
> pretty opaque. Is the str.join version worse than that?


you be the judge :-)

Code enclosed.

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython

from itertools import chain #, join

def join(sep, iterable):
    notfirst=False
    for i in iterable:
        if notfirst:
            yield sep
        else:
            notfirst=True
        yield i

def join_iterable(table, columns, values):
    request = ''.join(chain(
        ('INSERT INTO ', table, '('),
        join(', ', columns),
        (') VALUES (',),
        chain.from_iterable(join(('), (',), (join(', ', ('%s' for v in value)) for value in values))),
        (') ON DUPLICATE KEY UPDATE ',),
        chain.from_iterable(join((', '), ((c, '=VALUES(', c, ')') for c in columns))),
    ))

    return request


def join_str(table, columns, values):
    request = ''.join(chain(
        ('INSERT INTO ', table, '('),
        ', '.join(columns),
        (') VALUES (',),
        chain.from_iterable('), ('.join(', '.join('%s' for v in value) for value in values)),
        (') ON DUPLICATE KEY UPDATE ',),
        chain.from_iterable(', '.join(f"{c}=VALUES({c})" for c in columns)),
    ))
    return request

def join_str_simple(table, columns, values):
# table = 'mytable'
# columns=('id', 'v1', 'v2')
# values = [(0, 1, 2), (3, 4, 5), (6, 7, 8)]

    request = "".join([f"INSERT INTO ",
                       f"{table}({', '.join(columns)})",
                        " VALUES (",
                        "), (".join(', '.join('%s' for v in value) for value in values),
                        ") ON DUPLICATE KEY UPDATE ",
                        ", ".join(f"{c}=VALUES({c})" for c in columns),
                        ])
    return request

# args = list(chain.from_iterable(values))


table = 'mytable'
columns=('id', 'v1', 'v2')
values = [(0, 1, 2), (3, 4, 5), (6, 7, 8)]

request = join_iterable(table, columns, values)
print(request)
print()
request = join_str(table, columns, values)
print(request)

print()
request = join_str_simple(table, columns, values)
print(request)


assert join_iterable(table, columns, values) == join_str(table, columns, values)
assert join_iterable(table, columns, values) == join_str_simple(table, columns, values)

#    > INSERT INTO mytable(id, v1, v2) VALUES (%s, %s, %s), (%s, %s, %s), (%s, %s, %s) ON DUPLICATE KEY UPDATE id=VALUES(id), v1=VALUES(v1), v2=VALUES(v2)

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OX6DEXWDZ7QI32JJOVKSEA5BQMBJZHHB/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: An itertools.interleave / itertools.join function

Reply via email to