Or you can use AllChem.CalcMolFormula() to get the chemical formula.
Ling

On Wed, Oct 7, 2015 at 3:55 AM, Andrew Dalke <da...@dalkescientific.com>
wrote:

> On Oct 7, 2015, at 11:30 AM, Christos Kannas wrote:
> > Yes there is an easier way, by using substructure search, i.e. do a
> substructure search for [C] and then get the number of matches.
>
> > m = Chem.MolFromSmiles("CCCCCCCCc1ccccc1")
>
> > patt= Chem.MolFromSmarts("[C]")
>
> > pm = m.GetSubstructMatches(patt)
> > print len(pm)
> > 8
>
> You'll notice that this returned 8 when Joos Kiener wanted the count of
> all the C-Atoms. I assume that means all 14 carbon atoms, and not only the
> 8 aliphatic carbon atoms, so the SMARTS should be tweaked somewhat:
>
> >>> from rdkit import Chem
> >>> mol = Chem.MolFromSmiles("CCCCCCCCc1ccccc1")
> >>> pat = Chem.MolFromSmarts("[#6]")
> >>> len(mol.GetSubstructMatches(pat))
> 14
>
> On the topic of counting carbons given a molecule, there are two general
> approaches - the SMARTS pattern, and atom iteration, though it's better to
> count the number of atomic number matches rather than use the symbol:
>
> from rdkit import Chem
> pat = Chem.MolFromSmarts("[#6]")
>
> def count1(mol):
>     return len(mol.GetSubstructMatches(pat))
>
> def count2(mol):
>     return sum(1 for atom in mol.GetAtoms() if atom.GetAtomicNum() == 6)
>
>
> It's possible to time these two functions, to see which has the better
> performance:
>
> import time
>
> def go(count, mol, n=1000):
>     _range = range(n)
>     t1 = time.time()
>     m = mol
>     for x in _range:
>         c = count(mol)
>     t2 = time.time()
>     return c, t2-t1
>
> With that timing scaffold in place:
>
>
> >>> print go(count1, mol)
> (8, 0.020318031311035156)
> >>> print go(count2, mol)
> (8, 0.07634186744689941)
>
> This makes it seem like the SMARTS match solution is the best. However,
> there are two lurking details.
>
> First, when the structure has more than ~256 carbons, the iterative
> solution is faster:
>
> >>> for i in range(10):
> ...     mol = Chem.MolFromSmiles("C" * (2**i))
> ...     c1, t1 = go(count1, mol)
> ...     c2, t2 = go(count2, mol)
> ...     assert c1 == c2
> ...     print 2**i, t1, t2
> ...
> 1 0.00930690765381 0.0515530109406
> 2 0.00659918785095 0.0535910129547
> 4 0.00970101356506 0.0570240020752
> 8 0.0159358978271 0.0636560916901
> 16 0.0273370742798 0.0776889324188
> 32 0.0527520179749 0.103546857834
> 64 0.107499837875 0.159503221512
> 128 0.218550920486 0.266411066055
> 256 0.513395786285 0.4887611866
> 512 1.53533577919 0.901851892471
>
> Second, by default there is a limit to the number of SMARTS matches:
>
> >>> mol = Chem.MolFromSmiles("C" * 999)
> >>> count1(mol)
> 999
> >>> mol = Chem.MolFromSmiles("C" * 1000)
> >>> count1(mol)
> 1000
> >>> mol = Chem.MolFromSmiles("C" * 1001)
> >>> count1(mol)
> 1000
>
> Quoting from the documentation:
>
>   In high-symmetry cases with medium-sized molecules, it is very
>   easy to end up with a combinatorial explosion in the number of
>   possible matches. This argument prevents that from having
>   unintended consequences
>
> Instead, this is a better general-purpose SMART matcher count function:
>
> def count3(mol):
>     return len(mol.GetSubstructMatches(pat, maxMatches=mol.GetNumAtoms()))
>
>
> >>> count3(mol)
> 1001
>
>
>                                 Andrew
>                                 da...@dalkescientific.com
>
>
>
>
> ------------------------------------------------------------------------------
> Full-scale, agent-less Infrastructure Monitoring from a single dashboard
> Integrate with 40+ ManageEngine ITSM Solutions for complete visibility
> Physical-Virtual-Cloud Infrastructure monitoring from one console
> Real user monitoring with APM Insights and performance trend reports
> Learn More
> http://pubads.g.doubleclick.net/gampad/clk?id=247754911&iu=/4140
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Full-scale, agent-less Infrastructure Monitoring from a single dashboard
Integrate with 40+ ManageEngine ITSM Solutions for complete visibility
Physical-Virtual-Cloud Infrastructure monitoring from one console
Real user monitoring with APM Insights and performance trend reports 
Learn More http://pubads.g.doubleclick.net/gampad/clk?id=247754911&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to