On Mon, 2007-01-29 at 14:03 +0200, Nina Jeliazkova wrote:
> Hello,
> 
> Egon Willighagen <[EMAIL PROTECTED]> wrote:
> 
> > On Sunday 28 January 2007 21:58, Rajarshi Guha wrote:
> > > Invariably I will know the names of my SD tags, so I can always create a
> > > temporary HashMap and extract the relevant properties via getProperty()
> > > and populate the HashMap and then send it setSdFields().
> > 
> > Well, I guess my hesitation comes from the fact that I have to know about SD
> 
> > internals when wanting to read or store simple molecular properties... I 
> > understand the step is not large, the math rather simple, but still... CDK 
> > has set/getProperty() to store properties... why add another layer?
> > 
> 
> I aggree with Egon, adding another layer doesn't seem an elegant approach. Few
> more thoughts:

To a large extent I agree.

My objections are from an 'ease of use' point of view (which is
debatable :)

> 1)User calculates some descriptors and saves them as SDF file properties. Then
> at some moment this SDF file is read and properties loaded - how they should
> be loaded - as special_SDF_properties or as regular properties? 
> 
> 2)Imagine another file format appears that is able to store properties -
> should we introduce another my_file_special_Properties? 
> That's not hypothetical - I already have implemented classes extending
> DefaultChemObjectWriter able to store / read compounds (as SMILES) along with
> properties in comma delimited , tab delimited and MSExcel XLS files. That was
> done by user request - is CDK team interested in these classes?

I would be interested in seeing an XLS output writer - will it be able
to run on Linux? Or does it need Windows DLL's?

> 3)My problem with SDF fields are that they come with many different names -
> for example CAS number can be "CAS RN", "CAS", "CAS#" , "CAS Registry Number",
> etc. The same with chemical names, smiles, descriptor names.  Once properties
> are loaded, there is no easy way to recognise which field means what, without
> asking the user. When working with multiple SDF files coming from different
> sources, and trying to combine information, it becomes a nightmare.

I agree - but this is going to happen even if if we place all the SD tag
values as molecule properties directly (i.e., the current situation)

> Some solutions to be discussed: 
> - introduce something like "translator facility" to perform translation 
> property_name_in_file -> property_name_to_use_in_this_software
> 
> - separate properties into different Hashmaps  - like identifiers (cas,
> names), descriptors, measured properties, etc.

I don't think this is a great idea since this leads to a much more
specialization than simply saying that these sets of properties are
derived from an SD file. I think this is especially important since an
SDF can contan arbitrary properties - I don't see any way of making
sense of those properties other than knowing what the tags mean and I
think this drawback is a feature of the SD format (lack of
dictionaries).

My original point in this was to suggest a way to make clear that a set
of properties did come from SD tags. I realize that this is a lazy
approach and I understand that this leads to a specialization of the
properties - but it seems to me that SD properties are special (as
opposed to properties that I might set when manipulating a molecule in
my code)

In this sense, then, one can view the set of tags in an SD file as a
'property' of that molecule - justifying inclusion of all the SD tags as
a single property in the molecule. The fact that this 'property' has
several properties within it is a feature of the SD tags property.

-------------------------------------------------------------------
Rajarshi Guha <[EMAIL PROTECTED]>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
A hacker does for love what others would not do for money.



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to