Hi Thomas, It's possible to use TEMPORARY TABLE for this purpose in a single transaction. This is the scheme we use in order to convert the input application SMILES into a canonicalized RDKit SMILES. We keep the RDKit canonical SMILES around in the table for exact isomer look ups, but this lets us throw away the input application SMILES. This scheme also lets us bulk insert many rows at a time in order to gain decent insertion performance.
BEGIN; CREATE TEMPORARY TABLE temp_input_smiles ON COMMIT DROP AS TABLE input_smiles WITH NO DATA; CREATE TRIGGER canonicalize_new_smiles_trigger BEFORE INSERT OR UPDATE ON temp_input_smiles FOR EACH ROW EXECUTE PROCEDURE canonicalize_new_smiles(); CREATE TRIGGER propagate_new_smiles_tags_trigger AFTER INSERT OR UPDATE ON temp_input_smiles FOR EACH ROW EXECUTE PROCEDURE propagate_new_smiles(); ALTER TABLE temp_input_smiles ENABLE ALWAYS TRIGGER canonicalize_new_smiles_trigger; ALTER TABLE temp_input_smiles ENABLE ALWAYS TRIGGER propagate_new_smiles_tags_trigger; -- INSERT many SMILES data into temp_input_smiles -- COMMIT; CREATE OR REPLACE FUNCTION canonicalize_new_smiles() RETURNS TRIGGER AS $$ BEGIN IF (NEW.rdkit_smiles IS NULL) THEN NEW.rdkit_smiles := mol_to_smiles(NEW.smiles::mol); END IF; RETURN NEW; EXCEPTION WHEN SQLSTATE '22000' THEN -- 22000 is 'data exception' NEW.rdkit_smiles := NULL; RETURN NEW; END; $$ language 'plpgsql'; CREATE OR REPLACE FUNCTION propagate_new_smiles() RETURNS TRIGGER AS $$ DECLARE tag_val jsonb; DECLARE offsets jsonb; BEGIN IF (NEW.rdkit_smiles IS NOT NULL) THEN -- INSERT NEW.rdkit_smiles::mol into production table -- END IF; RETURN NEW; END; $$ language 'plpgsql'; -Brian On Mon, Oct 26, 2020 at 1:45 AM Thomas Strunz <beginn...@hotmail.de> wrote: > Dear community, > > I was wondering on how to best insert molecules into a mol field. The > documentation only show how to insert from a preexisting table with a > smiles column and then use "mol_to_smiles". > > How can a molecule be inserted directly? eg what format need to be > submitted? > > Second point is how to make the insertion simple so that not all > applications connecting to the DB need to be chemically aware (have rdkit > available). I have played around with simply having a smiles column and mol > field column and then use a trigger function to convert the smiles to a > mol. But this duplicates all the data (and even more wasteful with ctab > file). Is it possible to not duplicate the data and be able to insert > smiles/ctab directly? > > Best Regards, > > Thomas > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss