Re: [ccp4bb] [EXTERNAL] [ccp4bb] renaming chains

2021-11-01 Thread Edward A. Berry

I have noticed renaming chains in the sequence ID's but not in the coordinates:

2FBW_3|Chains C, G[auth P]|Succinate dehydrogenase cytoch

MATTAKEEMARFWEKNTKSSRPLSPHISIYKWSLPMAMSITHRGTGVALSLGVSLFSL

2FBW_4|Chains D, H[auth Q]|Succinate dehydrogenase [ubiqu

GSSKAASLHWTSERAVSALLLGLLPAAYLYPGPAVDYSLAAALTLHGHWGLGQVITDY

2FBW_1|Chains A, E[auth N]|Succinate dehydrogenase [ubiqu

STKVSDSISTQYPVVDHEFDAVVVGAGGAGLRAAFGLSEAGFNTACVTKLFPTRSHTV

2FBW_2|Chains B, F[auth O]|Succinate dehydrogenase [ubiqu

AQTTSRIKKFSIYRWDPDKPGDKPRMQTYEVDLNKCGPMVLDALIKIKNELDST

CHAINS NOPQ (AUTH) are labeled EFGH in the "view FASTA"
The coordinates still have the oddball lettering. Which could be confusing.

Ed



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] what would be the best metric to asses the quality of a mtz file?

2021-11-01 Thread James Holton

Hi David,

Why not do all those things with Rwork? It is much less noisy than 
Rfree. Have you ever seen a case in such analysis where Rwork didn't 
tell you the same thing Rfree did?  If so, did you believe the difference?


Once when I was playing with lossy image compression if I picked just 
the right compression ratio I could get slightly better Rfree. But that 
is not something I'd recommend as a good idea.


-James Holton
MAD Scientist

On 11/1/2021 2:22 AM, David Waterman wrote:

Hi James,

What you wrote makes lots of sense. I had not heard about Rsleep, so 
that looks like interesting reading, thanks.


I have often used Rfree as a simple tool to compare two protocols. If 
I am not actually optimising against Rfree but just using it for a 
one-off comparison then that is okay, right?


Let's say I have two data processing protocols, A and B. Between these 
I might be exploring some difference in options within one data 
processing program, perhaps different geometry refinement parameters, 
or scaling options. I expect the A and B data sets to be quite 
similar, but I would like to evaluate which protocol was "better", and 
I want to do this quickly, ideally looking at a single number. I don't 
like I/sigI because I don't trust the sigmas, CC1/2 is often noisy, 
and I'm totally sworn off merging R statistics for these purposes. I 
tend to use Rfree as an easily-available metric, independent from the 
data processing program and the merging stats. It also allows a 
comparison of A and B in terms of the "product" of crystallography, 
namely the refined structure. In this I am lucky because I'm not 
trying to solve a structure. I may be looking at lysozyme or 
proteinase K: something where I can download a pretty good 
approximation to the truth from the PDB.


So, what I do is process the data by A and process by B, ensure the 
data sets have the same free set, then refine to convergence (or at 
least, a lot of cycles) starting from a PDB structure. I then evaluate 
A vs B in terms of Rfree, though without an error bar on Rfree I don't 
read too much into small differences.


Does this procedure seem sound? Perhaps it could be improved by 
randomly jiggling the atoms in the starting structure, in case the PDB 
deposition had already followed an A- or B-like protocol. Perhaps the 
whole approach is suspect. Certainly I wouldn't want to generalise by 
saying that A or B is better in all cases, but I do want to find a way 
to assess the various tweaks I can try in data processing for a single 
case.


Any thoughts? I appreciate the wisdom of the BB here.

Cheers

-- David


On Fri, 29 Oct 2021 at 15:50, James Holton  wrote:


Well, of all the possible metrics you could use to asses data
quality Rfree is probably the worst one.  This is because it is a
cross-validation metric, and cross-validations don't work if you
use them as an optimization target. You can try, and might even
make a little headway, but then your free set is burnt. If you
have a third set of observations, as suggested for Rsleep
(doi:10.1107/S0907444907033458), then you have a chance at another
round of cross-validation. Crystallographers don't usually do
this, but it has become standard practice in machine learning
(training=Rwork, validation=Rfree and testing=Rsleep).

So, unless you have an Rsleep set, any time you contemplate doing
a bunch of random things and picking the best Rfree ... don't. 
Just don't.  There madness lies.

What happens after doing this is you will be initially happy about
your lower Rfree, but everything you do after that will make it go
up more than it would have had you not performed your Rfree
optimization. This is because the changes in the data that made
Rfree randomly better was actually noise, and as the structure
becomes more correct it will move away from that noise. It's
always better to optimize on something else, and then check your
Rfree as infrequently as possible. Remember it is the control for
your experiment. Never mix your positive control with your sample.

As for the best metric to assess data quality?  Well, what are you
doing with the data? There are always compromises in data
processing and reduction that favor one application over another. 
If this is a "I just want the structure" project, then score on
the resolution where CC1/2 hits your favorite value. For some that
is 0.5, others 0.3. I tend to use 0.0 so I can cut it later
without re-processing. Whatever you do just make it consistent.

If its for anomalous, score on CCanom or if that's too noisy the
Imean/sigma in the lowest-angle resolution or highest-intensity
bin. This is because for anomalous you want to minimize relative
error. The end-all-be-all of anomalous signal strength is the
phased anomalous difference Fourier. You need phases to do one,
but if you have a structure just omit an 

Re: [ccp4bb] renaming chains

2021-11-01 Thread Oganesyan, Vaheh
Dear John Helliwell,

I should have been more clear in addressing my email. It was in response to 
John Berrisford’s answer.

Regards,

Vaheh

From: John R Helliwell 
Sent: Monday, November 1, 2021 3:13 PM
To: Oganesyan, Vaheh 
Cc: CCP4BB@jiscmail.ac.uk
Subject: Re: [ccp4bb] renaming chains

Dear Valeh
Apologies, I was solely replying to the numbering/labelling of waters 
observation/query of Mohamed, ie as a function of in situ parameter (time, 
temperature and pressure).
Best wishes
John
Emeritus Professor John R Helliwell DSc


On 1 Nov 2021, at 18:39, Oganesyan, Vaheh 
mailto:vaheh.oganes...@astrazeneca.com>> wrote:

Hi John,

Thank you for explanation.
The most recent encounter I had on this issue was with 3DMM and 3B9K. You would 
know better what caused chain renaming here, but it doesn’t look like any of 
the two scenarios you describe. Whatever the reason was I’m glad to hear that 
this is not a widespread rule that applies to all structures.

Regards,
Vaheh Oganesyan, Ph.D.
[cid:image001.png@01D7CF33.8D9C2020]
R | Biologics Engineering
One Medimmune Way, Gaithersburg, MD 20878
T:  301-398-5851
vaheh.oganes...@astrazeneca.com



From: John Berrisford mailto:j...@ebi.ac.uk>>
Sent: Monday, November 1, 2021 12:53 PM
To: Oganesyan, Vaheh 
mailto:vaheh.oganes...@astrazeneca.com>>
Cc: CCP4BB@jiscmail.ac.uk
Subject: Re: [ccp4bb] renaming chains

Dear Vaheh

Usually we do not rename chains as part of the curation procedure.
There are instances when we do, for example when a chain has to be split
into two chains and a new chain has to be defined, but this isn't
typical.

Because of this the wwPDB mmCIF file for each entry will usually contain
the chains as defined by the depositor.
If two letter chain IDs were used by the depositor then this is
incompatible with the PDB format and a best effort PDB file is created.
This will contain remapped chain IDs with a single letter for the
chains.
If you are using this best effort PDB file then I would encourage using
the mmCIF file instead.

If this is not the case, please can you share examples through our
helpdesk (deposit-h...@mail.wwpdb.org) 
where this has occurred so we can
investigate.

Many thanks

John

On 2021-11-01 15:00, Oganesyan, Vaheh wrote:
> Hi All,
>
> This question is mostly for RCSB and PDBe: why are you renaming chains
> in the deposited PDB files? Why does it matter what letter is assigned
> to the chain? For 1,2 or 3 chain structures it is manageable, but for
> more chains and/or many complexes per asu this becomes quite a
> challenge. And if chains are similar in shape it is a real pain.
> Reading associated manuscripts and looking at those structures is an
> additive that feels unnecessary.
>
> Thank you in advance for explaining the logic behind.
>
> Vaheh Oganesyan, Ph.D.
>
> R | Biologics Engineering
>
> One Medimmune Way, Gaithersburg, MD 20878
>
> T: 301-398-5851
>
> vaheh.oganes...@astrazeneca.com
>
> -
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

--
John Berrisford
PDBe
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD UK
Tel: +44 1223 492529



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] renaming chains

2021-11-01 Thread John R Helliwell
Dear Valeh
Apologies, I was solely replying to the numbering/labelling of waters 
observation/query of Mohamed, ie as a function of in situ parameter (time, 
temperature and pressure). 
Best wishes 
John 

Emeritus Professor John R Helliwell DSc

> On 1 Nov 2021, at 18:39, Oganesyan, Vaheh  
> wrote:
> 
> 
> Hi John,
>  
> Thank you for explanation.
> The most recent encounter I had on this issue was with 3DMM and 3B9K. You 
> would know better what caused chain renaming here, but it doesn’t look like 
> any of the two scenarios you describe. Whatever the reason was I’m glad to 
> hear that this is not a widespread rule that applies to all structures.
>  
> Regards,
> Vaheh Oganesyan, Ph.D.
> 
> R | Biologics Engineering
> One Medimmune Way, Gaithersburg, MD 20878
> T:  301-398-5851
> vaheh.oganes...@astrazeneca.com
>  
>  
>  
> From: John Berrisford  
> Sent: Monday, November 1, 2021 12:53 PM
> To: Oganesyan, Vaheh 
> Cc: CCP4BB@jiscmail.ac.uk
> Subject: Re: [ccp4bb] renaming chains
>  
> Dear Vaheh
> 
> Usually we do not rename chains as part of the curation procedure.
> There are instances when we do, for example when a chain has to be split 
> into two chains and a new chain has to be defined, but this isn't 
> typical.
> 
> Because of this the wwPDB mmCIF file for each entry will usually contain 
> the chains as defined by the depositor.
> If two letter chain IDs were used by the depositor then this is 
> incompatible with the PDB format and a best effort PDB file is created. 
> This will contain remapped chain IDs with a single letter for the 
> chains.
> If you are using this best effort PDB file then I would encourage using 
> the mmCIF file instead.
> 
> If this is not the case, please can you share examples through our 
> helpdesk (deposit-h...@mail.wwpdb.org) where this has occurred so we can 
> investigate.
> 
> Many thanks
> 
> John
> 
> On 2021-11-01 15:00, Oganesyan, Vaheh wrote:
> > Hi All,
> > 
> > This question is mostly for RCSB and PDBe: why are you renaming chains
> > in the deposited PDB files? Why does it matter what letter is assigned
> > to the chain? For 1,2 or 3 chain structures it is manageable, but for
> > more chains and/or many complexes per asu this becomes quite a
> > challenge. And if chains are similar in shape it is a real pain.
> > Reading associated manuscripts and looking at those structures is an
> > additive that feels unnecessary.
> > 
> > Thank you in advance for explaining the logic behind.
> > 
> > Vaheh Oganesyan, Ph.D.
> > 
> > R | Biologics Engineering
> > 
> > One Medimmune Way, Gaithersburg, MD 20878
> > 
> > T: 301-398-5851
> > 
> > vaheh.oganes...@astrazeneca.com
> > 
> > -
> > 
> > To unsubscribe from the CCP4BB list, click the following link:
> > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
> 
> -- 
> John Berrisford
> PDBe
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD UK
> Tel: +44 1223 492529
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] renaming chains

2021-11-01 Thread Oganesyan, Vaheh
Hi John,

Thank you for explanation.
The most recent encounter I had on this issue was with 3DMM and 3B9K. You would 
know better what caused chain renaming here, but it doesn't look like any of 
the two scenarios you describe. Whatever the reason was I'm glad to hear that 
this is not a widespread rule that applies to all structures.

Regards,
Vaheh Oganesyan, Ph.D.
[cid:image001.png@01D7CF2E.22CE9C50]
R | Biologics Engineering
One Medimmune Way, Gaithersburg, MD 20878
T:  301-398-5851
vaheh.oganes...@astrazeneca.com



From: John Berrisford 
Sent: Monday, November 1, 2021 12:53 PM
To: Oganesyan, Vaheh 
Cc: CCP4BB@jiscmail.ac.uk
Subject: Re: [ccp4bb] renaming chains

Dear Vaheh

Usually we do not rename chains as part of the curation procedure.
There are instances when we do, for example when a chain has to be split
into two chains and a new chain has to be defined, but this isn't
typical.

Because of this the wwPDB mmCIF file for each entry will usually contain
the chains as defined by the depositor.
If two letter chain IDs were used by the depositor then this is
incompatible with the PDB format and a best effort PDB file is created.
This will contain remapped chain IDs with a single letter for the
chains.
If you are using this best effort PDB file then I would encourage using
the mmCIF file instead.

If this is not the case, please can you share examples through our
helpdesk (deposit-h...@mail.wwpdb.org) 
where this has occurred so we can
investigate.

Many thanks

John

On 2021-11-01 15:00, Oganesyan, Vaheh wrote:
> Hi All,
>
> This question is mostly for RCSB and PDBe: why are you renaming chains
> in the deposited PDB files? Why does it matter what letter is assigned
> to the chain? For 1,2 or 3 chain structures it is manageable, but for
> more chains and/or many complexes per asu this becomes quite a
> challenge. And if chains are similar in shape it is a real pain.
> Reading associated manuscripts and looking at those structures is an
> additive that feels unnecessary.
>
> Thank you in advance for explaining the logic behind.
>
> Vaheh Oganesyan, Ph.D.
>
> R | Biologics Engineering
>
> One Medimmune Way, Gaithersburg, MD 20878
>
> T: 301-398-5851
>
> vaheh.oganes...@astrazeneca.com
>
> -
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

--
John Berrisford
PDBe
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD UK
Tel: +44 1223 492529



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] (renaming chains)….water numbering aspect

2021-11-01 Thread John R Helliwell
Dear Mohamed,
The situation you describe applies also to multiple crystal structures studied 
as a function not only of time but also of temperatures and pressures. I 
enquired if the numberings of discussed waters and so on could be retained 
across the related crystal structures in the PDB. The reply was that the single 
model for a single diffraction dataset prevented retaining the same numbering 
across these crystal structures. I found a simple alternative solution, as 
referee or author, albeit not ideal for the reader, of a numbering of key 
waters table in the associated publication with table columns labelled time, 
pressure or temperature. I have had mixed success retaining this explanatory 
table in the main text however versus being forced to put it into the 
Supplementary. 
Hope this helps,
Best wishes,
John 


Emeritus Professor John R Helliwell DSc




> On 1 Nov 2021, at 15:44, Mohamed Ibrahim  wrote:
> 
> 
> Hi All,
> 
> I have a very similar question; why can't we retain the same water numbering 
> as the deposited files. For articles that discuss waters, it is a 
> considerable challenge. For example, for serial crystallography structures, 
> it becomes confusing and harder for the readers to follow the water 
> numbering. Since the authors use the same numbering system for the similar 
> waters in the different structures of the same protein at different time 
> points during the enzyme activity, however, after depositing, the waters are 
> renumbered in each model differently.
> 
> Cheers,
> Mohamed
> 
>> On Mon, Nov 1, 2021 at 4:01 PM Oganesyan, Vaheh 
>>  wrote:
>> Hi All,
>> 
>>  
>> 
>> This question is mostly for RCSB and PDBe: why are you renaming chains in 
>> the deposited PDB files? Why does it matter what letter is assigned to the 
>> chain? For 1,2 or 3 chain structures it is manageable, but for more chains 
>> and/or many complexes per asu this becomes quite a challenge. And if chains 
>> are similar in shape it is a real pain. Reading associated manuscripts and 
>> looking at those structures is an additive that feels unnecessary.
>> 
>>  
>> 
>> Thank you in advance for explaining the logic behind.
>> 
>>  
>> 
>> Vaheh Oganesyan, Ph.D.
>> 
>> 
>> 
>> R | Biologics Engineering
>> 
>> One Medimmune Way, Gaithersburg, MD 20878
>> 
>> T:  301-398-5851
>> 
>> vaheh.oganes...@astrazeneca.com
>> 
>>  
>> 
>> 
>> To unsubscribe from the CCP4BB list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
>> 
> 
> 
> -- 
> ​​--
> Dr. Mohamed Ibrahim
> Postdoctoral Researcher
> Humboldt University   
> Berlin, Germany  
> Tel: +49 30 209347931 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] renaming chains

2021-11-01 Thread John Berrisford

Dear Vaheh

Usually we do not rename chains as part of the curation procedure.
There are instances when we do, for example when a chain has to be split 
into two chains and a new chain has to be defined, but this isn't 
typical.


Because of this the wwPDB mmCIF file for each entry will usually contain 
the chains as defined by the depositor.
If two letter chain IDs were used by the depositor then this is 
incompatible with the PDB format and a best effort PDB file is created. 
This will contain remapped chain IDs with a single letter for the 
chains.
If you are using this best effort PDB file then I would encourage using 
the mmCIF file instead.


If this is not the case, please can you share examples through our 
helpdesk (deposit-h...@mail.wwpdb.org) where this has occurred so we can 
investigate.


Many thanks

John

On 2021-11-01 15:00, Oganesyan, Vaheh wrote:

Hi All,

This question is mostly for RCSB and PDBe: why are you renaming chains
in the deposited PDB files? Why does it matter what letter is assigned
to the chain? For 1,2 or 3 chain structures it is manageable, but for
more chains and/or many complexes per asu this becomes quite a
challenge. And if chains are similar in shape it is a real pain.
Reading associated manuscripts and looking at those structures is an
additive that feels unnecessary.

Thank you in advance for explaining the logic behind.

Vaheh Oganesyan, Ph.D.

R | Biologics Engineering

One Medimmune Way, Gaithersburg, MD 20878

T:  301-398-5851

vaheh.oganes...@astrazeneca.com

-

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1


--
John Berrisford
PDBe
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD UK
Tel: +44 1223 492529



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


Re: [ccp4bb] renaming chains

2021-11-01 Thread Mohamed Ibrahim
Hi All,

I have a very similar question; why can't we retain the same water
numbering as the deposited files. For articles that discuss waters, it is a
considerable challenge. For example, for serial crystallography structures,
it becomes confusing and harder for the readers to follow the water
numbering. Since the authors use the same numbering system for the similar
waters in the different structures of the same protein at different time
points during the enzyme activity, however, after depositing, the waters
are renumbered in each model differently.

Cheers,
Mohamed

On Mon, Nov 1, 2021 at 4:01 PM Oganesyan, Vaheh <
vaheh.oganes...@astrazeneca.com> wrote:

> Hi All,
>
>
>
> This question is mostly for RCSB and PDBe: why are you renaming chains in
> the deposited PDB files? Why does it matter what letter is assigned to the
> chain? For 1,2 or 3 chain structures it is manageable, but for more chains
> and/or many complexes per asu this becomes quite a challenge. And if chains
> are similar in shape it is a real pain. Reading associated manuscripts and
> looking at those structures is an additive that feels unnecessary.
>
>
>
> Thank you in advance for explaining the logic behind.
>
>
>
> *Vaheh Oganesyan, Ph.D.*
>
> *R* *| Biologics Engineering*
>
> One Medimmune Way, Gaithersburg, MD 20878
>
> T:  301-398-5851
>
> *vaheh.oganes...@astrazeneca.com *
>
>
>
> --
>
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
>


-- 
​
*​--*
*Dr. Mohamed Ibrahim*
*Postdoctoral Researcher   *

*Humboldt University   *
*Berlin, Germany  *

*Tel: +49 30 209347931 *



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] renaming chains

2021-11-01 Thread Oganesyan, Vaheh
Hi All,

This question is mostly for RCSB and PDBe: why are you renaming chains in the 
deposited PDB files? Why does it matter what letter is assigned to the chain? 
For 1,2 or 3 chain structures it is manageable, but for more chains and/or many 
complexes per asu this becomes quite a challenge. And if chains are similar in 
shape it is a real pain. Reading associated manuscripts and looking at those 
structures is an additive that feels unnecessary.

Thank you in advance for explaining the logic behind.

Vaheh Oganesyan, Ph.D.
[cid:image001.png@01D7CF0E.5D61A0D0]
R | Biologics Engineering
One Medimmune Way, Gaithersburg, MD 20878
T:  301-398-5851
vaheh.oganes...@astrazeneca.com




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] Protein crystallographer position available at Sprint Bioscience (Huddinge, Sweden)

2021-11-01 Thread Lionel Trésaugues
Hi,

We are recruiting a protein scientist with a passion for protein X-ray 
crystallography. Apply today if you thrive in an efficient and 
multi-disciplinary research environment focused on driving innovative drug 
programs forward. We give you the opportunity to develop both within and 
outside of your discipline.

SPRINT BIOSCIENCE IN SHORT

Sprint Bioscience develops small-molecule drugs to fight cancer. With 
fragment-based drug design methods, advanced technical approaches, and leading 
specialists, we develop the drugs of the future - from the idea stage to a 
clinical candidate with the goal set at being "First-in-Class". Our business 
model is to identify and run drug development projects and license these in the 
preclinical phase to the global pharmaceutical industry. We choose projects 
with the greatest technical and commercial potential for success. Sprint 
Bioscience has had a yearly growth since our start in 2009. Today, we are about 
30 committed employees who work closely together in a stimulating environment 
with cutting-edge methodology and challenging projects. We also collaborate 
with several academic and industrial partners from all over the world. We are 
situated in tailored designed premises in Flemingsberg - Huddinge, 20 minutes 
by train from Stockholm City.

ABOUT THE POSITION

You will be a member of the Structural Biology team, consisting of scientists 
with expertise in the different areas related to protein crystallography 
(recombinant protein expression and purification, crystallogenesis, structure 
determination). Your main goal will be to develop crystal systems which will 
allow the generation of high-resolution crystal structures of complexes between 
proteins from our portfolio and the compounds synthesized by our Medicinal 
Chemistry team. Because Sprint Bioscience favored hit-to-lead approach is 
Structure-Based Drug Design, this activity is essential in order to progress 
our molecules towards a drug-candidate. Depending on your competences, you 
could also take part in the further structure determination, model refinement 
and structural analysis steps. We work in efficient and dynamic project teams, 
which are adaptable and change depending on project priorities. This is a 
full-time permanent position.

YOU WILL MAKE A DIFFERENCE

You will bring your expertise into a team of dedicated scientists. You need 
collaborative skills and to communicate well when supporting parallel projects 
in a multi-disciplinary environment. Our company language is English.
Your main contributions in the field of Structural Biology will be:
* Developing stable crystallization protocols suitable for the 
study by X-ray crystallography of protein-ligand complexes
* Applying these protocols to deliver high-resolution diffraction 
data to support the chemistry programs. Optional: delivering 3D model to 
chemists and guiding them in their analysis
* Sharing responsibilities in term of organization of the 
structural biology activities (organization of synchrotron sessions, 
maintenance of the equipment)
* Implementing new workflows and new methods for protein 
crystallography and structural biology in general

To be successful in this position you will need:

* An extensive experience and expertise in crystallization of 
soluble proteins and particularly of protein-ligand complexes
* A passion for structural biology (protein crystallography) and 
particularly method development in the different fields of the discipline
Having knowledge and experience in the following fields will be considered as 
benefits:
* Structure-based drug-design softwares
* Fragment-screening by X-ray crystallography
* Structure determination (from data collection to model refinement 
and analysis)
* Protein purification and characterization
* Methods in Structural Biology other than X-ray crystallography

You will be able to apply to the position by following the link below :
https://www.sprintbioscience.com/en/career/available-positions/experienced-scientist-with-a-passion-for-protein-crystallography/

Please, feel free to contact me if you wish more information.
Best Wishes / Vänlig hälsning,
Lionel Trésaugues

Lionel Trésaugues
Structural Biology
[cid:image001.jpg@01D7CF08.EFE18020]
Sprint Bioscience
Novum
SE 141 57 Huddinge
Sweden

Tel: +46 (0)8-411 44 55
Mobile: +46 (0)73-075 54 46
www.sprintbioscience.com

Confidentiality Note: This message is intended only for the use of the
named recipient(s) and may contain confidential, personal and/or
privileged information. If you are not the intended recipient, please
delete this message. Any unauthorized use of the information contained
in this message is prohibited.




To unsubscribe from the 

Re: [ccp4bb] what would be the best metric to asses the quality of a mtz file?

2021-11-01 Thread David Waterman
Hi James,

What you wrote makes lots of sense. I had not heard about Rsleep, so that
looks like interesting reading, thanks.

I have often used Rfree as a simple tool to compare two protocols. If I am
not actually optimising against Rfree but just using it for a one-off
comparison then that is okay, right?

Let's say I have two data processing protocols, A and B. Between these I
might be exploring some difference in options within one data processing
program, perhaps different geometry refinement parameters, or scaling
options. I expect the A and B data sets to be quite similar, but I would
like to evaluate which protocol was "better", and I want to do this
quickly, ideally looking at a single number. I don't like I/sigI because I
don't trust the sigmas, CC1/2 is often noisy, and I'm totally sworn off
merging R statistics for these purposes. I tend to use Rfree as an
easily-available metric, independent from the data processing program and
the merging stats. It also allows a comparison of A and B in terms of the
"product" of crystallography, namely the refined structure. In this I am
lucky because I'm not trying to solve a structure. I may be looking at
lysozyme or proteinase K: something where I can download a pretty good
approximation to the truth from the PDB.

So, what I do is process the data by A and process by B, ensure the data
sets have the same free set, then refine to convergence (or at least, a lot
of cycles) starting from a PDB structure. I then evaluate A vs B in terms
of Rfree, though without an error bar on Rfree I don't read too much into
small differences.

Does this procedure seem sound? Perhaps it could be improved by randomly
jiggling the atoms in the starting structure, in case the PDB deposition
had already followed an A- or B-like protocol. Perhaps the whole approach
is suspect. Certainly I wouldn't want to generalise by saying that A or B
is better in all cases, but I do want to find a way to assess the various
tweaks I can try in data processing for a single case.

Any thoughts? I appreciate the wisdom of the BB here.

Cheers

-- David


On Fri, 29 Oct 2021 at 15:50, James Holton  wrote:

>
> Well, of all the possible metrics you could use to asses data quality
> Rfree is probably the worst one.  This is because it is a cross-validation
> metric, and cross-validations don't work if you use them as an optimization
> target. You can try, and might even make a little headway, but then your
> free set is burnt. If you have a third set of observations, as suggested
> for Rsleep (doi:10.1107/S0907444907033458), then you have a chance at
> another round of cross-validation. Crystallographers don't usually do this,
> but it has become standard practice in machine learning (training=Rwork,
> validation=Rfree and testing=Rsleep).
>
> So, unless you have an Rsleep set, any time you contemplate doing a bunch
> of random things and picking the best Rfree ... don't.  Just don't.  There
> madness lies.
>
> What happens after doing this is you will be initially happy about your
> lower Rfree, but everything you do after that will make it go up more than
> it would have had you not performed your Rfree optimization. This is
> because the changes in the data that made Rfree randomly better was
> actually noise, and as the structure becomes more correct it will move away
> from that noise. It's always better to optimize on something else, and then
> check your Rfree as infrequently as possible. Remember it is the control
> for your experiment. Never mix your positive control with your sample.
>
> As for the best metric to assess data quality?  Well, what are you doing
> with the data? There are always compromises in data processing and
> reduction that favor one application over another.  If this is a "I just
> want the structure" project, then score on the resolution where CC1/2 hits
> your favorite value. For some that is 0.5, others 0.3. I tend to use 0.0 so
> I can cut it later without re-processing.  Whatever you do just make it
> consistent.
>
> If its for anomalous, score on CCanom or if that's too noisy the
> Imean/sigma in the lowest-angle resolution or highest-intensity bin. This
> is because for anomalous you want to minimize relative error. The
> end-all-be-all of anomalous signal strength is the phased anomalous
> difference Fourier. You need phases to do one, but if you have a structure
> just omit an anomalous scatterer of interest, refine to convergence, and
> then measure the peak height at the position of the omitted anomalous
> atom.  Instructions for doing anomalous refinement in refmac5 are here:
>
> https://www2.mrc-lmb.cam.ac.uk/groups/murshudov/content/refmac/refmac_keywords.html
>
> If you're looking for a ligand you probably want isomorphism, and in that
> case refining with a reference structure looking for low Rwork is not a bad
> strategy. This will tend to select for crystals containing a molecule that
> looks like the one you are refining.  But be careful! If it is an 

Re: [ccp4bb] keyword for refmac to output coordinates in cif format

2021-11-01 Thread John Berrisford

Dear Mark

The keyword you want is

pdbout format mmcif

This is described here:
https://www.wwpdb.org/deposition/PDBxDeposit

Regards

John

On 2021-10-30 00:03, Mark J. van Raaij wrote:

Dear All,

this may be something simple but I can’t find it in the CCP4i GUI or 
online.

Is there a keyword to make refmac output the coordinates as a cif file
instead of a pdb file - or better, as both?
Or is it some other program that converts the formats?

Mark J van Raaij
Dpto de Estructura de Macromoleculas
Centro Nacional de Biotecnologia - CSIC
calle Darwin 3
E-28049 Madrid, Spain



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
available at https://www.jiscmail.ac.uk/policyandsecurity/


--
John Berrisford
PDBe
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD UK
Tel: +44 1223 492529



To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/


[ccp4bb] “Crystallographer” and “Crystallography Research Assistant” at Exscientia

2021-11-01 Thread Simone Culurgioni
Dear all,



We are expanding our Crystallography team within the Discovery Biology
department at Exscientia and we currently have open positions for
“Crystallographer” and “Crystallography Research Assistant”.



The Discovery Biology team in Exscientia is responsible for enabling
small-molecule therapeutic projects and integrates with the Artificial
Intelligence (AI) driven drug discovery platform.



Please apply through the links:



“Crystallographer”https://apply.workable.com/exscientia/j/76E18EED44/



“Crystallography Research Assistant”
https://apply.workable.com/exscientia/j/1A6CDB6104/





Best wishes,



Simone



-- 
Dr Simone Culurgioni
Director of Crystallography
Exscientia Ltd.
The Schrödinger Building
Heatley Road
The Oxford Science Park
Oxford
OX4 4GE

Tel. 07892 715263

-- 
*Please note that this email communication and any attachments to it may 
contain confidential or privileged information and so circulation should be 
restricted to the addressee’s organisation. If you are not the addressee, 
please let us know by return email, then delete all copies of this email. 
**
Personal data that you share with us will be stored and used in 
accordance with our Privacy Policy which can be accessed on our website.*
*
Exscientia plc is a company incorporated in England and Wales with 
registered number 13483814 and its registered office at The Schrodinger 
Building, Oxford Science Park, Oxford, Oxfordshire, United Kingdom, OX4 
4GE*




To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/