[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-22 Thread Dong-hee Na


Change by Dong-hee Na :


--
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-22 Thread Dong-hee Na


Dong-hee Na  added the comment:


New changeset 38afeb1a336f0451c0db86df567ef726f49f6438 by Erlend Egeberg 
Aasland in branch 'main':
bpo-46249: Move set lastrowid out of the sqlite3 query loop (GH-30489)
https://github.com/python/cpython/commit/38afeb1a336f0451c0db86df567ef726f49f6438


--
nosy: +corona10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-12 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

On 11.01.2022 21:30, Erlend E. Aasland wrote:
> 
>> I'd suggest to deprecate the cursor.lastrowid attribute and
>> instead point people to the much more useful [...]
> 
> Yes, I think mentioning the RETURNING ROWID trick in the sqlite3 docs is a 
> very nice improvement. Mentioning the last_insert_rowid SQL function is 
> probably also worth consideration.
> 
> I'm reluctant to deprecate cursor.lastrowid, though. ATM, I'm leaning towards 
> just keeping the current behaviour.

Fair enough :-)

Perhaps just documenting that the value is not necessarily what
people may expect, when coming from other databases due to the
different semantics with SQLite, is enough.
 --
Marc-Andre Lemburg
eGenix.com

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-11 Thread Erlend E. Aasland


Erlend E. Aasland  added the comment:

> You don't know on which cursor the last row was inserted [...]

SQLite has no concept of cursors :)

> It also seems that the function really only works for INSERTs and
> not for UPDATEs.

Yes, hence it's name: sqlite3_last_insert_rowid() !

> I'd suggest to deprecate the cursor.lastrowid attribute and
> instead point people to the much more useful [...]

Yes, I think mentioning the RETURNING ROWID trick in the sqlite3 docs is a very 
nice improvement. Mentioning the last_insert_rowid SQL function is probably 
also worth consideration.

I'm reluctant to deprecate cursor.lastrowid, though. ATM, I'm leaning towards 
just keeping the current behaviour.

> But this is your call :-)

I hear you are saying that, but strictly speaking I'm _not_ the sqlite3 module 
maintainer; I'm just a very eager contributor :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-11 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

On 11.01.2022 20:46, Erlend E. Aasland wrote:
> 
> If we are to revert to this behaviour, we'll have to start examining the SQL 
> we are given (search for INSERT and REPLACE keywords, determine if they are 
> valid (i.e. not a comment, not part of a column or table name, etc.), which 
> will lead to a noticeable performance hit for every new statement (not for 
> statements reused via the LRU cache though). I'm not sure this is a good 
> idea. However I will give it a good thought.
>
> My first thought now, is that it would be better for the sqlite3 module to 
> align lastrowid with the behaviour of the C API sqlite3_last_insert_rowid() 
> (also available as an SQL function: last_insert_rowid). OTOH, the SQLite API 
> is tied to the _connection_ object, so it may not make sense to align it with 
> lastrowid which is a _cursor_ attribute.

I've had a look at the API description and find it less than useful,
to be honest:

https://sqlite.org/c3ref/last_insert_rowid.html

You don't know on which cursor the last row was inserted, it's
possible that this was or is done by a trigger and the last row
is not updated in case the INSERT does not succeed for some reason,
leaving it unchanged - without the user getting a notification of
this failure, since the .execute() call itself will succeed for
e.g. "INSERT INTO table SELECT ...;".

It also seems that the function really only works for INSERTs and
not for UPDATEs.

> Perhaps the Right Thing To Do™ is to be conservative and just leave it as it 
> is. I still want to apply the optimisation, though. It does not alter the 
> behaviour in any kind of way, and it speeds up executemany().

I'd suggest to deprecate the cursor.lastrowid attribute and
instead point people to the much more useful

"INSERT INTO t (name) VALUES ('two'), ('three') RETURNING ROWID;"

https://sqlite.org/lang_insert.html
https://sqlite.org/forum/forumpost/058ac49cc3

(good to know that SQLite has adopted this PostgreSQL variant as
well)

RETURNING is also available for UPDATES:

https://sqlite.org/lang_update.html

If people really want to use the sqlite3_last_insert_rowid()
functionality, they can use the SQL function of the same name:

https://www.sqlite.org/lang_corefunc.html#last_insert_rowid

which then has known semantics and doesn't conflict with the DB-API
specs.

But this is your call :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-11 Thread Erlend E. Aasland


Erlend E. Aasland  added the comment:

> OTOH, the SQLite API is tied to the _connection_ object, so it may not make 
> sense to align it with lastrowid which is a _cursor_ attribute.

That came out weird; let's try again:

The SQLite API is tied to the _connection_ object, so it may not make sense to 
force that behaviour onto lastrowid which is a _cursor_ attribute.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-11 Thread Erlend E. Aasland

Erlend E. Aasland  added the comment:

> Is 0 a valid row ID in SQLite ? If not, then I guess this would be
> an alternative to None as suggested by the DB-API.

Yes. Any 64 bit signed integer value is a valid row id in SQLite.

> If it is a valid row ID, I'd suggest to go back to resetting to None,
> since otherwise code might get confused: if an UPDATE does not get
> applied (e.g. a condition is false), code could then still take
> .lastrowid as referring to the UPDATE and not a previous
> operation, since code will now know whether the condition was met
> or not.

True. OTOH, we've had no requests for reverting to pre Python 3.6 behaviour.

If we are to revert to this behaviour, we'll have to start examining the SQL we 
are given (search for INSERT and REPLACE keywords, determine if they are valid 
(i.e. not a comment, not part of a column or table name, etc.), which will lead 
to a noticeable performance hit for every new statement (not for statements 
reused via the LRU cache though). I'm not sure this is a good idea. However I 
will give it a good thought.

My first thought now, is that it would be better for the sqlite3 module to 
align lastrowid with the behaviour of the C API sqlite3_last_insert_rowid() 
(also available as an SQL function: last_insert_rowid). OTOH, the SQLite API is 
tied to the _connection_ object, so it may not make sense to align it with 
lastrowid which is a _cursor_ attribute.

Perhaps the Right Thing To Do™ is to be conservative and just leave it as it 
is. I still want to apply the optimisation, though. It does not alter the 
behaviour in any kind of way, and it speeds up executemany().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-11 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

On 08.01.2022 21:56, Erlend E. Aasland wrote:
>  
> Marc-André: since Python 3.6, the sqlite3.Cursor.lastrowid attribute does no 
> longer comply with the recommendations of PEP 249:
> 
> Previously, lastrowid was set to None for operations other than INSERT or 
> REPLACE. This changed with ab994ed8b97e1b0dac151ec827c857f5e7277565 (in 
> Python 3.6), so that lastrowid is _unchanged_ for operations other than 
> INSERT or REPLACE, and it is set to 0 after the first valid SQL (that is not 
> INSERT/REPLACE) is executed on the cursor.
> 
> Now, PEP 249 only _recommends_ that lastrowid is set to None for operations 
> that do not modify a row, so it's probably not a big deal. No-one has ever 
> mentioned this change in behaviour; there have been no bug reports.
> 
> FTR, here is the relevant quote from PEP 249:
> 
> If the operation does not set a rowid or if the database does not support
> rowids, this attribute should be set to None.
> 
> (I interpret "should" as understood by RFC 2119.)

Well, it may be a little stronger than the SHOULD in the RFC, but then
again the whole DB-API is about conventions and if they don't make sense
for a database backend, it is possible to deviate from the spec, esp. for
optional extensions such as .lastrowid.

> So, my follow-up question becomes:
> I see no point in reverting to pre Python 3.6 behaviour. I would rather 
> change the default value to be 0 (to get rid of the dirty flag in GH-30380), 
> and to make the behaviour more consistent with how the actual SQLite API 
> behaves.
> 
> 
> Do you have an opinion about such a change (in behaviour)?

Is 0 a valid row ID in SQLite ? If not, then I guess this would be
an alternative to None as suggested by the DB-API.

If it is a valid row ID, I'd suggest to go back to resetting to None,
since otherwise code might get confused: if an UPDATE does not get
applied (e.g. a condition is false), code could then still take
.lastrowid as referring to the UPDATE and not a previous
operation, since code will now know whether the condition was met
or not.
 --
Marc-Andre Lemburg
eGenix.com

--
title: [sqlite3] lastrowid improvements -> [sqlite3] move set lastrowid out of 
the query loop and enable it for executemany()

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-08 Thread Erlend E. Aasland


Change by Erlend E. Aasland :


--
pull_requests: +28692
pull_request: https://github.com/python/cpython/pull/30489

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-08 Thread Erlend E. Aasland


Erlend E. Aasland  added the comment:

I see now that GH-30380 has grown a little bit out of scope, as it changes too 
many things at once:

1. Simplify the `execute()`/`executemany()` query loop
2. Create the lastrowid PyObject on demand, simplifying cursor GC
3. `lastrowid` is now available for all `execute*()` methods
4. The default value of `lastrowid` is now `0`

It will be easier to reason about and review each change separately.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-08 Thread Erlend E. Aasland

Erlend E. Aasland  added the comment:

Marc-André: since Python 3.6, the sqlite3.Cursor.lastrowid attribute does no 
longer comply with the recommendations of PEP 249:

Previously, lastrowid was set to None for operations other than INSERT or 
REPLACE. This changed with ab994ed8b97e1b0dac151ec827c857f5e7277565 (in Python 
3.6), so that lastrowid is _unchanged_ for operations other than INSERT or 
REPLACE, and it is set to 0 after the first valid SQL (that is not 
INSERT/REPLACE) is executed on the cursor.

Now, PEP 249 only _recommends_ that lastrowid is set to None for operations 
that do not modify a row, so it's probably not a big deal. No-one has ever 
mentioned this change in behaviour; there have been no bug reports.

FTR, here is the relevant quote from PEP 249:

If the operation does not set a rowid or if the database does not support
rowids, this attribute should be set to None.

(I interpret "should" as understood by RFC 2119.)

So, my follow-up question becomes:
I see no point in reverting to pre Python 3.6 behaviour. I would rather change 
the default value to be 0 (to get rid of the dirty flag in GH-30380), and to 
make the behaviour more consistent with how the actual SQLite API behaves.


Do you have an opinion about such a change (in behaviour)?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-04 Thread Erlend E. Aasland


Erlend E. Aasland  added the comment:

There are some inaccuracies in the docs I need to fix first, since those 
changes must be backported, and I want the updated docs applied upon the 
corrected docs; I'll open a separate issue/PR for that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop and enable it for executemany()

2022-01-04 Thread Erlend E. Aasland


Change by Erlend E. Aasland :


--
title: [sqlite3] move set lastrowid out of the query loop -> [sqlite3] move set 
lastrowid out of the query loop and enable it for executemany()

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop

2022-01-04 Thread Erlend E. Aasland


Erlend E. Aasland  added the comment:

True that :) I'll close GH-30371 and prepare GH-30380 for review. I'll request 
your review if you don't mind.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop

2022-01-04 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

On 04.01.2022 21:02, Erlend E. Aasland wrote:
> 
> Erlend E. Aasland  added the comment:
> 
>> If possible, it's usually better to have the .executemany() create a
>> cursor with an output array providing the row ids, e.g. using "INSERT ...
>> RETURNING ..." (PostgreSQL). That way you can access all row ids and
>> can also provide the needed detail in case the INSERTs happen out of
>> order to map them to the input data.
> 
> Hm, maybe so. But it will add a lot of overhead and complexity to 
> executemany(), and there haven't been requests for this feature for sqlite3. 
> AFAIK, there hasn't been request for lastrowid for executemany() at all. 
> OTOH, my proposal of modifying lastrowid to always show the rowid of the 
> actual last inserted row is a very cheap operation, _and_ it simplifies the 
> code (=> increased maintainability), so I think I'll go for that.

Sorry, I wasn't suggesting this for SQLite; it's just a better
and more flexible option than using cursor.lastrowid where
available. Sadly, the PG extension is not standards conform SQL.

>> For cases where you don't need sequence IDs, it's often better to
>> not rely on auto-increment columns for IDs, but instead use random
>> pre-generated IDs. Saves roundtrips to the database and works nicely
>> with cluster databases as well.
> 
> Yes, but in those cases you keep track of the row id yourself, so you 
> probably won't need lastrowid ;)

Indeed, and that's the point :-)

Those auto-increment ID fields are
not really such a good idea to begin with. Either you know that you
will need to manipulate the rows after inserting them (in which case
you can set an ID) or you don't care about the individual rows and
only want to aggregate or search in them based on other fields.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop

2022-01-04 Thread Erlend E. Aasland


Erlend E. Aasland  added the comment:

> If possible, it's usually better to have the .executemany() create a
> cursor with an output array providing the row ids, e.g. using "INSERT ...
> RETURNING ..." (PostgreSQL). That way you can access all row ids and
> can also provide the needed detail in case the INSERTs happen out of
> order to map them to the input data.

Hm, maybe so. But it will add a lot of overhead and complexity to 
executemany(), and there haven't been requests for this feature for sqlite3. 
AFAIK, there hasn't been request for lastrowid for executemany() at all. OTOH, 
my proposal of modifying lastrowid to always show the rowid of the actual last 
inserted row is a very cheap operation, _and_ it simplifies the code (=> 
increased maintainability), so I think I'll go for that.

> For cases where you don't need sequence IDs, it's often better to
> not rely on auto-increment columns for IDs, but instead use random
> pre-generated IDs. Saves roundtrips to the database and works nicely
> with cluster databases as well.

Yes, but in those cases you keep track of the row id yourself, so you probably 
won't need lastrowid ;)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop

2022-01-04 Thread Marc-Andre Lemburg

Marc-Andre Lemburg  added the comment:

On 04.01.2022 11:02, Erlend E. Aasland wrote:
> 
> Erlend E. Aasland  added the comment:
> 
> Thank you for your input Marc-André.
> 
> For SQLite, it's pretty simple: we use an API called 
> sqlite3_last_insert_rowid() which takes the database connection as it's 
> argument, not a statement pointer. This function returns "the rowid of the 
> most recent successful INSERT into a rowid table or virtual table on database 
> connection" (quote from SQLite docs). IMO, it would make sense to also use 
> this post executemany().

Sounds like a plan.

If possible, it's usually better to have the .executemany() create a
cursor with an output array providing the row ids, e.g. using "INSERT ...
RETURNING ..." (PostgreSQL). That way you can access all row ids and
can also provide the needed detail in case the INSERTs happen out of
order to map them to the input data.

For cases where you don't need sequence IDs, it's often better to
not rely on auto-increment columns for IDs, but instead use random
pre-generated IDs. Saves roundtrips to the database and works nicely
with cluster databases as well.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop

2022-01-04 Thread Erlend E. Aasland

Erlend E. Aasland  added the comment:

Thank you for your input Marc-André.

For SQLite, it's pretty simple: we use an API called 
sqlite3_last_insert_rowid() which takes the database connection as it's 
argument, not a statement pointer. This function returns "the rowid of the most 
recent successful INSERT into a rowid table or virtual table on database 
connection" (quote from SQLite docs). IMO, it would make sense to also use this 
post executemany().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop

2022-01-04 Thread Marc-Andre Lemburg


Marc-Andre Lemburg  added the comment:

On 04.01.2022 00:49, Erlend E. Aasland wrote:
> 
> Erlend E. Aasland  added the comment:
> 
> I see that PEP 249 does not define the semantics of lastrowid for 
> executemany(). What's the precedence here, MAL? IMO, it would be nice to be 
> able to fetch the last row id after you've done a 1000 inserts using 
> executemany().

The DB-API deliberately leaves this undefined, since there are many ways you
could implement this, e.g.

- return the last row id for the last entry in the array passed to 
.executemany()
- return the last row id of the last actually modified/inserted row after
running .executemany()
- return an array of row ids, one for each row modified/inserted
- return a row id of one of the modified/inserted rows, without defining which
- always return None for .executemany()

Note that in some cases, the order of actions taken by the database is not
predefined (e.g. some databases run such inserts in chunks across
a cluster), so even the "last" semantics are not clear.

> So, another option would be to keep "set-lastrowid" in the query loop, and 
> just remove the condition; we set it every time (but of course only build a 
> PyObject of it when the loop is done).

Since the DB-API leaves this undefined, you are free to add your own
meaning, which makes most sense for SQLite, to always return None or
not implement it at all.

DB-API extensions such as Cursor.lastrowid are optional and don't need
to be implemented if they don't make sense for a particular use case:

https://www.python.org/dev/peps/pep-0249/#optional-db-api-extensions

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop

2022-01-03 Thread Erlend E. Aasland


Erlend E. Aasland  added the comment:

> So, another option would be to keep "set-lastrowid" in the query loop, and
> just remove the condition; we set it every time (but of course only build a
> PyObject of it when the loop is done).

I made a competing PR (GH-30380) for this, just a little better (IMO); the Py 
object is only built on demand :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop

2022-01-03 Thread Erlend E. Aasland


Change by Erlend E. Aasland :


--
pull_requests: +28590
pull_request: https://github.com/python/cpython/pull/30380

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop

2022-01-03 Thread Erlend E. Aasland


Erlend E. Aasland  added the comment:

I see that PEP 249 does not define the semantics of lastrowid for 
executemany(). What's the precedence here, MAL? IMO, it would be nice to be 
able to fetch the last row id after you've done a 1000 inserts using 
executemany().

So, another option would be to keep "set-lastrowid" in the query loop, and just 
remove the condition; we set it every time (but of course only build a PyObject 
of it when the loop is done).

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop

2022-01-03 Thread Erlend E. Aasland


Change by Erlend E. Aasland :


--
keywords: +patch
pull_requests: +28583
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/30371

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46249] [sqlite3] move set lastrowid out of the query loop

2022-01-03 Thread Erlend E. Aasland


New submission from Erlend E. Aasland :

The query loop in _pysqlite_query_execute() is run only once for ordinary 
execute()'s, but multiple times for executemany(). We use the 'multiple' 
variable to differ between the two execute methods; for execute(), multiple is 
false, for executemany(), multiple is true. At the end of the loop, the 
'lastrowid' connection attribute is set, if multiple is false. We can safely 
move this part out of the loop; it is irrelevant for executemany(), and it will 
only be run once for execute().

--
assignee: erlendaasland
components: Extension Modules
messages: 409620
nosy: erlendaasland
priority: normal
severity: normal
status: open
title: [sqlite3] move set lastrowid out of the query loop
type: enhancement
versions: Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com