Hi Thomas,

It's possible to use TEMPORARY TABLE for this purpose in a single
transaction. This is the scheme we use in order to convert the input
application SMILES into a canonicalized RDKit SMILES. We keep the RDKit
canonical SMILES around in the table for exact isomer look ups, but this
lets us throw away the input application SMILES. This scheme also lets us
bulk insert many rows at a time in order to gain decent insertion
performance.

BEGIN;
CREATE TEMPORARY TABLE temp_input_smiles ON COMMIT DROP AS TABLE
input_smiles WITH NO DATA;
CREATE TRIGGER canonicalize_new_smiles_trigger BEFORE INSERT OR UPDATE ON
temp_input_smiles FOR EACH ROW EXECUTE PROCEDURE canonicalize_new_smiles();
CREATE TRIGGER propagate_new_smiles_tags_trigger AFTER INSERT OR UPDATE ON
temp_input_smiles FOR EACH ROW EXECUTE PROCEDURE propagate_new_smiles();
ALTER TABLE temp_input_smiles ENABLE ALWAYS TRIGGER
canonicalize_new_smiles_trigger;
ALTER TABLE temp_input_smiles ENABLE ALWAYS TRIGGER
propagate_new_smiles_tags_trigger;
-- INSERT many SMILES data into temp_input_smiles --
COMMIT;

CREATE OR REPLACE FUNCTION canonicalize_new_smiles()
RETURNS TRIGGER AS $$
BEGIN
  IF (NEW.rdkit_smiles IS NULL) THEN
    NEW.rdkit_smiles := mol_to_smiles(NEW.smiles::mol);
  END IF;
  RETURN NEW;
EXCEPTION WHEN SQLSTATE '22000' THEN -- 22000 is 'data exception'


  NEW.rdkit_smiles := NULL;
  RETURN NEW;
END;
$$ language 'plpgsql';

CREATE OR REPLACE FUNCTION propagate_new_smiles()
RETURNS TRIGGER AS $$
DECLARE tag_val jsonb;
DECLARE offsets jsonb;
BEGIN
  IF (NEW.rdkit_smiles IS NOT NULL) THEN
    -- INSERT NEW.rdkit_smiles::mol into production table --
  END IF;
  RETURN NEW;
END;
$$ language 'plpgsql';

-Brian

On Mon, Oct 26, 2020 at 1:45 AM Thomas Strunz <beginn...@hotmail.de> wrote:

> Dear community,
>
> I was wondering on how to best insert molecules into a mol field. The
> documentation only show how to insert from a preexisting table with a
> smiles column and then use "mol_to_smiles".
>
> How can a molecule be inserted directly? eg what format need to be
> submitted?
>
> Second point is how to make the insertion simple so that not all
> applications connecting to the DB need to be chemically aware (have rdkit
> available). I have played around with simply having a smiles column and mol
> field column and then use a trigger function to convert the smiles to a
> mol. But this duplicates all the data (and even more wasteful with ctab
> file). Is it possible to not duplicate the data and be able to insert
> smiles/ctab directly?
>
> Best Regards,
>
> Thomas
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to