We have recently been testing our 3 tier Cayenne app with a larger number of records and finding some performance problems with scale. In particular we have a test database with 60,000 student records (the production db will have 120,000). We are using paging (of 25 records) to ensure that when a user brings up the list of students on the client we don't load them all across the network. However we are still seeing load times of around 40 seconds. At the moment we are using objects, not DataRows on the client.

One part of the problem appears to occur on the server. The SQL Cayenne generates looks like this:

SELECT * FROM Student t0 WHERE (t0.isDeleted IS NULL) OR (t0.isDeleted = 0) ORDER BY t0.firstName

although the * is actually a list of every field in the table. We timed this query against the db and even directly without Cayenne it takes 16 seconds to return all the data. I presume the next 24 seconds Cayenne spends parsing the results into DataRows or Objects. Instead, if we executed:

SELECT t0.id FROM Student t0 WHERE (t0.isDeleted IS NULL) OR (t0.isDeleted = 0) ORDER BY t0.firstName

Then it returns in about 2 seconds (when performing this query directly against the db and not through Cayenne).


Q1: So, my first question is why is Cayenne doing this? If paging is switched on, shouldn't Cayenne be fetching only the primary keys and then faulting in the particular records it needs in full?

We can see paging working with the query which Cayenne executes next:

SELECT * t0 WHERE ((t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?) OR (t0.id = ?)) AND ((t0.isDeleted IS NULL) OR (t0.isDeleted = ?)) [bind: 18569, 18521, 18253, 18060, 17932, 16451, 16395, 15967, 15819, 14773, 14492, 14317, 14280, 13088, 12464, 11183, 10957, 10378, 10219, 9686, 9554, 9435, 9155, 8611, 8038, 0]

25 records are paged into Cayenne as expected. But Cayenne has already fetched into memory (server-side) all the records in the original query.

Q2: Would it more more efficient from a db perspective to use the "t0.id IS IN (?, ?, ?...)" style syntax here instead or does it make no difference? I don't know enough about SQL optimisation to know either way, but at least the SQL is shorter (which might be good!).



Cheers
Ari Maniatis


-------------------------->
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A


Attachment: PGP.sig
Description: This is a digitally signed message part

Reply via email to