Re: Using indexes

2021-09-28 Thread John English
On Tue, 28 Sep 2021, 17:52 Rick Hillegas,  wrote:

> Glad that you have made progress on this puzzle. One comment inline...
>
> On 9/28/21 4:10 AM, John English wrote:
> > The WHERE clause seems to prevent the index from being used.
> The index is not usable in this query because username is not the
> leading column in the index.
>

Ah, right.

>


Re: Using indexes

2021-09-28 Thread Rick Hillegas

Glad that you have made progress on this puzzle. One comment inline...

On 9/28/21 4:10 AM, John English wrote:
A couple more data points, from testing different variants of the 
inner select:


1) SELECT time FROM system_log
 ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY;

419ms, "Index Scan ResultSet for SYSTEM_LOG using index LOG_INDEX at 
read uncommitted isolation level using share row locking chosen by the 
optimizer"


2) SELECT time FROM system_log
 WHERE username='foo'
 ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY;

2245ms, "Table Scan ResultSet for SYSTEM_LOG at read uncommitted 
isolation level using share row locking chosen by the optimizer"


Removing "FETCH NEXT 20 ROWS ONLY":

1) 388,237 rows in 3859ms, otherwise same as above (uses index).
2) 23,211 rows in 2199ms, otherwise same as above (doesn't use index).

The WHERE clause seems to prevent the index from being used.
The index is not usable in this query because username is not the 
leading column in the index.





Re: Using indexes

2021-09-28 Thread John English
A couple more data points, from testing different variants of the inner 
select:


1) SELECT time FROM system_log
 ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY;

419ms, "Index Scan ResultSet for SYSTEM_LOG using index LOG_INDEX at 
read uncommitted isolation level using share row locking chosen by the 
optimizer"


2) SELECT time FROM system_log
 WHERE username='foo'
 ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY;

2245ms, "Table Scan ResultSet for SYSTEM_LOG at read uncommitted 
isolation level using share row locking chosen by the optimizer"


Removing "FETCH NEXT 20 ROWS ONLY":

1) 388,237 rows in 3859ms, otherwise same as above (uses index).
2) 23,211 rows in 2199ms, otherwise same as above (doesn't use index).

The WHERE clause seems to prevent the index from being used.

--
John English

--
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Using indexes

2021-09-28 Thread John English

On 27/09/2021 18:23, John English wrote:
I'll try the temporary table approach as soon as I get a minute, and 
will let you know what happens.


I'm finally getting somewhere. Having (a) moved the row counting out to 
a separate query, (b) renamed columns in the table to match what my code 
requires so I can access the table directly instead of using a view, and 
(c) using your (Rick's) original idea of a nested select (which I 
decided to try again before going for a temporary table), I get an 
execution time of 455ms (woo-hoo!) for this query:


SELECT DateTimeFormat(t_time,null) AS t_time,facility,event,details
FROM system_log,
(SELECT time FROM system_log
 ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY) AS x
WHERE system_log.time=x.time;

However, if I add in a WHERE clause:

SELECT DateTimeFormat(t_time,null) AS t_time,facility,event,details
FROM system_log,
(SELECT time FROM system_log
 WHERE username='foo'
 ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY) AS x
WHERE system_log.time=x.time;

the execution time goes up to 2 seconds (I'm still happy!). The left 
result set (the nested select, by the look of it) still doesn't use the 
index:


Left result set:
  Row Count (2):
  Number of opens = 1
  Rows seen = 20
  Rows filtered = 0
  ...
  Source result set:
Sort ResultSet:
Number of opens = 1
Rows input = 23211
Rows returned = 20
Eliminate duplicates = false
In sorted order = false
Sort information:
  Number of rows input=23211
  Number of rows output=23211
  Sort type=internal
  ...
  optimizer estimated row count: 38824.20
  optimizer estimated cost: 399695.58
Source result set:
  Project-Restrict ResultSet (4):
  Number of opens = 1
  Rows seen = 23211
  Rows filtered = 0
  restriction = false
  projection = true
  ...
  Source result set:
Table Scan ResultSet for SYSTEM_LOG at read uncommitted 
isolation level using share row locking chosen by the optimizer

Number of opens = 1
Rows seen = 23211

but the right result set uses the PK index.

So, I'm in a much better position as the result of Rick's advice, but I 
still have no idea why the descending PK index isn't ever used!!!


--
John English

--
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Using indexes

2021-09-27 Thread John English

On 26/09/2021 22:57, Rick Hillegas wrote:
The support for FETCH/OFFSET is pretty minimal. Probably, the optimizer 
isn't smart enough to know that the subquery returns only 20 small rows.


What happens if you dump the results of the subquery into a temporary 
table and then join that with system_log?


Thanks, I'll give it a whirl. Meanwhile,I've got it down to about 5 or 6 
seconds by (a) eliminating the view in favour of a direct table access, 
and (b) using a separate SELECT COUNT(*) to get the row count, again 
using the table directly and without using an ORDER BY clause. (I 
renamed the id and time columns to time and t_time respectively, so the 
names match what the code expects to get back from the view, and then 
recreated the index.)


It still doesn't seem to be using the index (see trace below) but it's 
an improvement. Sorting by time ASC use the PK index and takes about 
200ms, but time DESC doesn't use an index and takes about 3s. And I 
still don't understand why it's ignoring the index.


I'll try the temporary table approach as soon as I get a minute, and 
will let you know what happens.


Thanks for the help,
--
John English


Statement Text:
	SELECT DateTimeFormat(t_time,null) AS 
t_time,facility,event,details,name,username FROM system_log

ORDER BY time DESC NULLS LAST FETCH NEXT 20 ROWS ONLY
Parse Time: 19
Bind Time: 1
Optimize Time: 1
Generate Time: 0
Compile Time: 21
Execute Time: 4474
Begin Compilation Timestamp : 2021-09-27 17:52:23.802
End Compilation Timestamp : 2021-09-27 17:52:23.823
Begin Execution Timestamp : 2021-09-27 17:52:23.824
End Execution Timestamp : 2021-09-27 17:52:28.298
Statement Execution Plan Text:
Scroll Insensitive ResultSet:
Number of opens = 1
Rows seen = 20
Number of reads from hash table = 20
Number of writes to hash table = 20
constructor time (milliseconds) = 0
open time (milliseconds) = 4474
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 388236.00
optimizer estimated cost: 399685.32
Source result set:
Row Count (1):
Number of opens = 1
Rows seen = 20
Rows filtered = 0
constructor time (milliseconds) = 0
open time (milliseconds) = 4474
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 388236.00
optimizer estimated cost: 399685.32
Source result set:
Project-Restrict ResultSet (5):
Number of opens = 1
Rows seen = 20
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 4474
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 388236.00
optimizer estimated cost: 399685.32
Source result set:
Sort ResultSet:
Number of opens = 1
Rows input = 388231
Rows returned = 20
Eliminate duplicates = false
In sorted order = false
Sort information:
Number of rows input=388231
Number of rows output=388231
Sort type=internal
constructor time (milliseconds) = 0
open time (milliseconds) = 4474
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 388236.00
optimizer estimated cost: 399685.32
Source result set:
Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 388231
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 4232
close time (milliseconds) = 3
restriction time (milliseconds) = 0
projection time (milliseconds) = 1807
   

Re: Using indexes

2021-09-26 Thread Rick Hillegas

On 9/25/21 11:59 AM, John English wrote:

On 25/09/2021 21:14, Rick Hillegas wrote:

On 9/25/21 7:39 AM, John English wrote:
SELECT id,DateTimeFormat(time,null) AS 
t_time,name,username,facility,event,details

FROM system_log
ORDER BY id DESC
NULLS LAST
FETCH FIRST 20 ROWS ONLY;


I can remember whether you tried to rewrite the query to use a 
subquery. Something like this:


SELECT id, time AS t_time,name,username,facility,event,details
FROM
   system_log s,
   (
 SELECT id AS log_id
 FROM system_log
 ORDER BY id DESC
 NULLS LAST
 FETCH FIRST 20 ROWS ONLY
   ) t
WHERE s.id = t.log_id
;

Does that help?

-Rick


No, when I tried it, it made it even worse if anything. The subquery 
still fetches all 400,000 rows, sorts them without using the index, 
and then uses the results to select again in the outer query. I really 
don't understand why the index is ignored.


The support for FETCH/OFFSET is pretty minimal. Probably, the optimizer 
isn't smart enough to know that the subquery returns only 20 small rows.


What happens if you dump the results of the subquery into a temporary 
table and then join that with system_log?





Re: Using indexes

2021-09-26 Thread John English

On 25/09/2021 21:14, Rick Hillegas wrote:

SELECT id, time AS t_time,name,username,facility,event,details
FROM
   system_log s,
   (
     SELECT id AS log_id
     FROM system_log
     ORDER BY id DESC
     NULLS LAST
     FETCH FIRST 20 ROWS ONLY
   ) t
WHERE s.id = t.log_id
;


Ideally I want to access the table through a view, defined like this:

CREATE VIEW system_log_view AS
  SELECT  time AS t_time,
  facility,
  event,
  details,
  name,
  username,
  id AS time
  FROMsystem_log

(The renamings may seem strange but they serve a purpose!)

Selecting the ids directly from the table takes 0.4s:

  SELECT id FROM system_log
  ORDER BY id DESC NULLS LAST
  FETCH FIRST 20 ROWS ONLY;

Going through the view takes much longer (2.9s):

  SELECT time FROM system_log_view
  ORDER BY time DESC NULLS LAST
  FETCH FIRST 20 ROWS ONLY;

It seems like the view is materializing all the rows, and then 20 rows 
are being selected from the 400,000:


Source result set:
Sort ResultSet:
Number of opens = 1
Rows input = 388226
Rows returned = 20

In any case I want to know the total number of rows, even though I 
display them in pages of 20 at a time. I thought I might get away with 
using a separate query for this, but no:


  SELECT COUNT(*) FROM system_log;  -- 0.271s
  SELECT COUNT(*) FROM system_log_view; -- 2.184s

I don't understand why the view is so much slower than accessing the 
table directly. The direct table access says this:


Index Scan ResultSet for SYSTEM_LOG using index LOG_INDEX at read 
uncommitted isolation level using share row locking chosen by the optimizer


The corresponding line from executing the view doesn't mention the index:

Table Scan ResultSet for SYSTEM_LOG at read uncommitted isolation level 
using share row locking chosen by the optimizer


What am I not understanding about views here?
--
John English

--
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Using indexes

2021-09-25 Thread John English

On 25/09/2021 21:14, Rick Hillegas wrote:

On 9/25/21 7:39 AM, John English wrote:
SELECT id,DateTimeFormat(time,null) AS 
t_time,name,username,facility,event,details

FROM system_log
ORDER BY id DESC
NULLS LAST
FETCH FIRST 20 ROWS ONLY;


I can remember whether you tried to rewrite the query to use a subquery. 
Something like this:


SELECT id, time AS t_time,name,username,facility,event,details
FROM
   system_log s,
   (
     SELECT id AS log_id
     FROM system_log
     ORDER BY id DESC
     NULLS LAST
     FETCH FIRST 20 ROWS ONLY
   ) t
WHERE s.id = t.log_id
;

Does that help?

-Rick


No, when I tried it, it made it even worse if anything. The subquery 
still fetches all 400,000 rows, sorts them without using the index, and 
then uses the results to select again in the outer query. I really don't 
understand why the index is ignored.


--
John English

--
This email has been checked for viruses by AVG.
https://www.avg.com



Re: Using indexes

2021-09-25 Thread Rick Hillegas

On 9/25/21 7:39 AM, John English wrote:
SELECT id,DateTimeFormat(time,null) AS 
t_time,name,username,facility,event,details

FROM system_log
ORDER BY id DESC
NULLS LAST
FETCH FIRST 20 ROWS ONLY;


I can remember whether you tried to rewrite the query to use a subquery. 
Something like this:


SELECT id, time AS t_time,name,username,facility,event,details
FROM
  system_log s,
  (
    SELECT id AS log_id
    FROM system_log
    ORDER BY id DESC
    NULLS LAST
    FETCH FIRST 20 ROWS ONLY
  ) t
WHERE s.id = t.log_id
;

Does that help?

-Rick



Using indexes

2021-09-25 Thread John English
I asked something like this some months ago, but am still not getting 
anywhere...


I have a table used for logging events, which current contains just 
under 400,000 rows:


CREATE TABLE system_log (
  id  INTEGER   GENERATED ALWAYS AS IDENTITY,
  timeTIMESTAMP DEFAULT NULL,
  usernameVARCHAR(15),
  nameVARCHAR(520),
  facilityVARCHAR(15)   NOT NULL,
  event   VARCHAR(31)   NOT NULL,
  details VARCHAR(32000),
  CONSTRAINT systemlog_pk   PRIMARY KEY (id)
);

I generally want to display most recent events first, so have defined an 
index for the purpose:


CREATE INDEX log_index ON system_log (id DESC);

I execute the following query:

SELECT id,DateTimeFormat(time,null) AS 
t_time,name,username,facility,event,details

FROM system_log
ORDER BY id DESC
NULLS LAST
FETCH FIRST 20 ROWS ONLY;

(DateTimeFormat is a simple function that converts a timestamp into 
something human-readable like "25-Sep-2021 at 12:30".)


A timing test showed that it takes 6.348 seconds to execute this. If I 
order by id ASC, it takes 0.124 seconds.


I used SYSCS_SET_RUNTIMESTATISTICS(1) and 
SYSCS_SET_STATISTICS_TIMING(1), and got a dump showing that it's doing 
an external sort and not using the index using ORDER BY id DESC, but 
using the PK index for ORDER BY id ASC (see below)


Why isn't the first query using the index I created, and what can I do 
to persuade Derby to use it?


Rick Hillegas mentioned "covering indexes", but thanks to the large 
"details" column I was unable to create such a beast. Would a CLOB be 
any better?


TIA,
--
John English

-
Using ORDER BY id DESC:

Statement Text:
	SELECT id,DateTimeFormat(time,null) AS 
t_time,name,username,facility,event,details

FROM system_log
ORDER BY id DESC
NULLS LAST
FETCH FIRST 20 ROWS ONLY
Parse Time: 20
Bind Time: 1
Optimize Time: 1
Generate Time: 1
Compile Time: 23
Execute Time: 6305
Begin Compilation Timestamp : 2021-09-25 17:15:19.947
End Compilation Timestamp : 2021-09-25 17:15:19.97
Begin Execution Timestamp : 2021-09-25 17:15:19.971
End Execution Timestamp : 2021-09-25 17:15:26.295
Statement Execution Plan Text:
Scroll Insensitive ResultSet:
Number of opens = 1
Rows seen = 20
Number of reads from hash table = 20
Number of writes to hash table = 20
constructor time (milliseconds) = 0
open time (milliseconds) = 6302
next time (milliseconds) = 3
close time (milliseconds) = 0
optimizer estimated row count: 388223.00
optimizer estimated cost: 398112.27
Source result set:
Row Count (1):
Number of opens = 1
Rows seen = 20
Rows filtered = 0
constructor time (milliseconds) = 0
open time (milliseconds) = 6302
next time (milliseconds) = 3
close time (milliseconds) = 18
optimizer estimated row count: 388223.00
optimizer estimated cost: 398112.27
Source result set:
Sort ResultSet:
Number of opens = 1
Rows input = 388218
Rows returned = 20
Eliminate duplicates = false
In sorted order = false
Sort information:
Number of merge runs=11
Number of rows input=388218
Number of rows output=388218
			Size of merge runs=[32767, 32767, 32767, 32767, 32767, 32767, 32767, 
32767, 32767, 32767, 32767]

Sort type=external
constructor time (milliseconds) = 0
open time (milliseconds) = 6302
next time (milliseconds) = 3
close time (milliseconds) = 18
optimizer estimated row count: 388223.00
optimizer estimated cost: 398112.27
Source result set:
Project-Restrict ResultSet (3):
Number of opens = 1
Rows seen = 388218
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 3881
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 1434
optimizer estimated row count: 388223.00
optimizer estimated cost: 398112.27
Source result set:
Table Scan ResultSet for SYSTEM_LOG at read uncommitted isolation 
level using share row locking chosen by the optimizer