Just wondering,
If you change the table format to MyISAM do you get better performance?
I tend to use MyISAM for scientific loads where transactions and rollback are
not required.
Vassilis
On 10/29/23 17:03, Ivan Krylov via discuss wrote:
Dear MariaDB users,
(I've asked the same question at
<https://dba.stackexchange.com/q/332557>, so feel free to ignore this
one if you've already seen the other.)
Our scientific application needs to store and query fundamental
parameters for a number of molecules. There are 2 to 28 million rows
per molecule, but the number of molecules is expected to stay small
(currently 4). Here's our schema:
CREATE TABLE `mol_trans` (
`species_id` int(11) DEFAULT NULL,
`wl_vac` double DEFAULT NULL,
`upper_id` int(11) DEFAULT NULL,
`lower_id` int(11) DEFAULT NULL,
`prob` double DEFAULT NULL,
`flag` tinyint(4) DEFAULT NULL,
KEY `spid_flag_wl` (`species_id`,`flag`,`wl_vac`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
PARTITION BY LIST (`species_id`)
(PARTITION `CaO` VALUES IN (6115) ENGINE = InnoDB,
PARTITION `CN3` VALUES IN (6121) ENGINE = InnoDB,
PARTITION `CN2` VALUES IN (6119) ENGINE = InnoDB,
PARTITION `AlO` VALUES IN (6109) ENGINE = InnoDB)
(The partitions are here to make it easier to drop a whole molecule if
needed, which would otherwise be a painful large DELETE.)
I'm using the admittedly old MariaDB 10.3.39 from Debian Buster for
tests (connecting via the UNIX domain socket using the command line
client), but we've been seeing the same problem on MySQL 5.6 and
MariaDB 10.11 on Windows 10.
The following query currently takes 1m 7s on my computer, as measured
by running `time echo "$QUERY" | mysql "$DATABASE" >/dev/null`:
select
mtr.prob,
mtr.lower_id,
mtr.upper_id
from
mol_trans mtr
where (
mtr.species_id=6115
and mtr.wl_vac > 766.0
and mtr.wl_vac < 883.0
and mtr.flag = 1
)
order by mtr.wl_vac;
It needs to return 3024559 rows. The query seems to be using the right
index, judging by the EXPLAIN output:
id: 1
select_type: SIMPLE
table: mtr
type: ref
possible_keys: spid_flag_wl
key: spid_flag_wl
key_len: 7
ref: const,const
rows: 14158123
Extra: Using where
With SET PROFILING=ON and the query cache disabled, the single
operation with a duration of >1 second in the 23-row profile is
"Sending data":
Status: Sending data
Duration: 66.712277
CPU_user: 66.436978
CPU_system: 0.279932
Context_voluntary: 2156
Context_involuntary: 6964
Block_ops_out: 0
Messages_sent: 0
Messages_received: 0
Page_faults_major: 0
Page_faults_minor: 0
Swaps: 0
Source_function: <unknown>
Source_file: sql_parse.cc
Source_line: 6086
Is there a way to spend less time sending data? I know the query can in
theory run faster because it takes less than 10 seconds on SQLite and
PostgreSQL, but if we switched our application from database engine to
database engine every time we started having performance problems, we
would be very far from done by now. I also know that MariaDB can be
very fast at transferring data because it takes me only a few minutes
more to download the whole database dump.
_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]