Just wondering,

If you change the table format to MyISAM do you get better performance?

I tend to use MyISAM for scientific loads where transactions and rollback are 
not required.

    Vassilis


On 10/29/23 17:03, Ivan Krylov via discuss wrote:
Dear MariaDB users,

(I've asked the same question at
<https://dba.stackexchange.com/q/332557>, so feel free to ignore this
one if you've already seen the other.)

Our scientific application needs to store and query fundamental
parameters for a number of molecules. There are 2 to 28 million rows
per molecule, but the number of molecules is expected to stay small
(currently 4). Here's our schema:

CREATE TABLE `mol_trans` (
   `species_id` int(11) DEFAULT NULL,
   `wl_vac` double DEFAULT NULL,
   `upper_id` int(11) DEFAULT NULL,
   `lower_id` int(11) DEFAULT NULL,
   `prob` double DEFAULT NULL,
   `flag` tinyint(4) DEFAULT NULL,
   KEY `spid_flag_wl` (`species_id`,`flag`,`wl_vac`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
  PARTITION BY LIST (`species_id`)
(PARTITION `CaO` VALUES IN (6115) ENGINE = InnoDB,
  PARTITION `CN3` VALUES IN (6121) ENGINE = InnoDB,
  PARTITION `CN2` VALUES IN (6119) ENGINE = InnoDB,
  PARTITION `AlO` VALUES IN (6109) ENGINE = InnoDB)

(The partitions are here to make it easier to drop a whole molecule if
needed, which would otherwise be a painful large DELETE.)

I'm using the admittedly old MariaDB 10.3.39 from Debian Buster for
tests (connecting via the UNIX domain socket using the command line
client), but we've been seeing the same problem on MySQL 5.6 and
MariaDB 10.11 on Windows 10.

The following query currently takes 1m 7s on my computer, as measured
by running `time echo "$QUERY" | mysql "$DATABASE" >/dev/null`:

select
   mtr.prob,
   mtr.lower_id,
   mtr.upper_id
from
   mol_trans mtr
where (
   mtr.species_id=6115
   and mtr.wl_vac > 766.0
   and mtr.wl_vac < 883.0
   and mtr.flag = 1
)
order by mtr.wl_vac;

It needs to return 3024559 rows. The query seems to be using the right
index, judging by the EXPLAIN output:

            id: 1
   select_type: SIMPLE
         table: mtr
          type: ref
possible_keys: spid_flag_wl
           key: spid_flag_wl
       key_len: 7
           ref: const,const
          rows: 14158123
         Extra: Using where

With SET PROFILING=ON and the query cache disabled, the single
operation with a duration of >1 second in the 23-row profile is
"Sending data":

              Status: Sending data
            Duration: 66.712277
            CPU_user: 66.436978
          CPU_system: 0.279932
   Context_voluntary: 2156
Context_involuntary: 6964
       Block_ops_out: 0
       Messages_sent: 0
   Messages_received: 0
   Page_faults_major: 0
   Page_faults_minor: 0
               Swaps: 0
     Source_function: <unknown>
         Source_file: sql_parse.cc
         Source_line: 6086

Is there a way to spend less time sending data? I know the query can in
theory run faster because it takes less than 10 seconds on SQLite and
PostgreSQL, but if we switched our application from database engine to
database engine every time we started having performance problems, we
would be very far from done by now. I also know that MariaDB can be
very fast at transferring data because it takes me only a few minutes
more to download the whole database dump.


_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to