[ccp4bb] PhD position available

2012-09-12 Thread Tales Rocha
http://www.mpibpc.mpg.de/9150561/20-12

Ph. D. Position

Job offer from July 06, 2012

The Research Groups “Nucleic Acid Chemistry” (Dr. Claudia Höbartner) and
“Macromolecular crystallography” (Dr. Vladimir Pena) at the
Max-Planck-Institute for Biophysical Chemistry invite applications for a

*Ph.D. Position

(Code Number 20-12)


Project title: Structural and mechanistic studies of catalytic DNA*

In addition to protein and RNA catalysts, DNA molecules termed
deoxyribozymes have the ability to catalyse chemical transformations.
Despite the practical applications of the deoxyribozymes as chemical tools,
the catalytic mechanism is poorly understood mainly because no
high-resolution structures of catalytically active DNA are available. This
project provides the unique opportunity to answer the fundamental question
how DNA catalysis is possible.

A decisive technical goal of the proposed investigation is the crystal
structure determination of a DNA catalyst in complex with the RNA substrate
by means of X-ray crystallography. The atomic structure will be further
complemented with detailed biochemical experiments intended to dissect and
reveal the reaction mechanism.

The successful candidate should have a Master or equivalent degree in
Chemistry, Biochemistry, or Molecular Biology. The position is available
immediately and applications will be screened until the position is filled.

The successful candidate will be awarded a Max Planck Fellowship.

The Max Planck Society is trying to increase the percentage of women on its
scientific staff and strongly encourages applications from qualified women.

Please send your application including CV, copies of high school and
university certificates, publications/manuscripts (if applicable) and
address details of two referees *with reference to the code number* *20-12
via e-mail to*

*choe...@gwdg.de*

Max Planck Institute for Biophysical Chemistry

Dr. Claudia Höbartner

Am Fassberg 11, 37077 Göttingen

Germany

For more information have a look to our web pages:

http://www.mpibpc.mpg.de/home/pena/Research/index.html

http://www.mpibpc.mpg.de/english/research/ags/hoebartner/index.html

http://www.uni-goettingen.de/en/sh/56870.html


[ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Jacob Keller
Dear List,

since this probably comes up a lot in manipulation of pdb/reflection files
and so on, I was curious what people thought would be the best language for
the following: I have some huge (100s MB) tables of tab-delimited data on
which I would like to do some math (averaging, sigmas, simple arithmetic,
etc) as well as some sorting and rejecting. It can be done in Excel, but
this is exceedingly slow even in 64-bit, so I am looking to do it through
some scripting. Just as an example, a "sort" which takes >10 min in Excel
takes ~10 sec max with the unix command sort (seems crazy, no?). Any
suggestions?

Thanks, and sorry for being off-topic,

Jacob

-- 
***
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
email: j-kell...@northwestern.edu
***


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Eric Williams
Try R. :)

http://www.r-project.org/

Eric

On Wed, Sep 12, 2012 at 10:32 AM, Jacob Keller <
j-kell...@fsm.northwestern.edu> wrote:

> Dear List,
>
> since this probably comes up a lot in manipulation of pdb/reflection files
> and so on, I was curious what people thought would be the best language for
> the following: I have some huge (100s MB) tables of tab-delimited data on
> which I would like to do some math (averaging, sigmas, simple arithmetic,
> etc) as well as some sorting and rejecting. It can be done in Excel, but
> this is exceedingly slow even in 64-bit, so I am looking to do it through
> some scripting. Just as an example, a "sort" which takes >10 min in Excel
> takes ~10 sec max with the unix command sort (seems crazy, no?). Any
> suggestions?
>
> Thanks, and sorry for being off-topic,
>
> Jacob
>
> --
> ***
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> email: j-kell...@northwestern.edu
> ***
>


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread George M. Sheldrick
I always use FORTRAN for such tasks, especially if speed is important.

George

On 09/12/2012 04:32 PM, Jacob Keller wrote:
> Dear List,
> 
> since this probably comes up a lot in manipulation of pdb/reflection
> files and so on, I was curious what people thought would be the best
> language for the following: I have some huge (100s MB) tables of
> tab-delimited data on which I would like to do some math (averaging,
> sigmas, simple arithmetic, etc) as well as some sorting and rejecting.
> It can be done in Excel, but this is exceedingly slow even in 64-bit, so
> I am looking to do it through some scripting. Just as an example, a
> "sort" which takes >10 min in Excel takes ~10 sec max with the unix
> command sort (seems crazy, no?). Any suggestions?
> 
> Thanks, and sorry for being off-topic,
> 
> Jacob
> 
> -- 
> ***
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> email: j-kell...@northwestern.edu 
> ***

-- 
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread DUMAS Philippe (UDS)

Le Mercredi 12 Septembre 2012 16:40 CEST, "George M. Sheldrick" 
 a écrit:

May I add a little personal joke to the serious remark by George.

This remembers me a discussion I had with Jorge Navaza, let's say 15 years ago, 
about the programming language of the future.
(To a good approximation, 15 years ago, the future was now)
The answer by Jorge was: "I don't know what it will be, but I know it's name 
will be FORTRAN".
I hope he will confirm the statement...
Philippe Dumas


> I always use FORTRAN for such tasks, especially if speed is important.
>
> George
>
> On 09/12/2012 04:32 PM, Jacob Keller wrote:
> > Dear List,
> >
> > since this probably comes up a lot in manipulation of pdb/reflection
> > files and so on, I was curious what people thought would be the best
> > language for the following: I have some huge (100s MB) tables of
> > tab-delimited data on which I would like to do some math (averaging,
> > sigmas, simple arithmetic, etc) as well as some sorting and rejecting.
> > It can be done in Excel, but this is exceedingly slow even in 64-bit, so
> > I am looking to do it through some scripting. Just as an example, a
> > "sort" which takes >10 min in Excel takes ~10 sec max with the unix
> > command sort (seems crazy, no?). Any suggestions?
> >
> > Thanks, and sorry for being off-topic,
> >
> > Jacob
> >
> > --
> > ***
> > Jacob Pearson Keller
> > Northwestern University
> > Medical Scientist Training Program
> > email: j-kell...@northwestern.edu 
> > ***
>
> --
> Prof. George M. Sheldrick FRS
> Dept. Structural Chemistry,
> University of Goettingen,
> Tammannstr. 4,
> D37077 Goettingen, Germany
> Tel. +49-551-39-3021 or -3068
> Fax. +49-551-39-22582






Re: [ccp4bb] the lysozyme of membrane proteins?

2012-09-12 Thread R. M. Garavito
Ho,

A second the vote for OmpF, but many porins could do.  Although it is a little 
harder to purify from native membranes, OmpF has the advantage that it can be 
crystallized in about 1-2 hours from a simple detergent solution with different 
PEGs AND (!!!) it is as stable as a rock (you can drop it on the floor, scrap 
it up, and it is still alive).  Over 25 years ago we used it in one of the 
first EMBO courses on membrane protein crystallization and it worked like a 
charm.  "Assaying" it is a problem, but you can very it is there by a gel shift 
assay (Unheated in SDS it is a trimer, heated it is a monomer).   However, 
other porins (LamB, OmpC, etc.) and porin-like proteins (EstA) could work 
nicely.

Cheers,

Michael


R. Michael Garavito, Ph.D.
Professor of Biochemistry & Molecular Biology
603 Wilson Rd., Rm. 513   
Michigan State University  
East Lansing, MI 48824-1319
Office:  (517) 355-9724 Lab:  (517) 353-9125
FAX:  (517) 353-9334Email:  rmgarav...@gmail.com





On Sep 11, 2012, at 6:09 PM, Toufic El Arnaout wrote:

> Hi,
> Just for info if you were to use the LCP method (a course by itself), check 
> this about OmpF ("membrane lysozyme"):
> http://www.sciencedirect.com/science/article/pii/S1047847712000834 
> bR protein sometimes takes weeks to give crystals and people prefer the dark 
> (depends on the conf state).. but good idea for spectro assays (check 
> reaction centres/light-harvesting complexes too).
> Beta barrels are very stable too.
> If you want to use the GFP fusion and you want to cleave it, it might add 
> extra time and steps to the students than going directly with a GFP free 
> his-tagged protein to trials.
> Regards
> 
> 
> Toufic El Arnaout
> Membrane Structural and Functional Biology Group
> Trinity College Dublin
> 
> On Tue, Sep 11, 2012 at 10:18 PM, Ho Leung Ng  wrote:
> Hello,
> 
>   I am developing an undergraduate biochemistry lab class and
> would like to incorporate experiments with membrane proteins. Does
> anyone have suggestions on membrane proteins that are relatively easy
> to express, purify, and assay? Bonus points for crystallizable! At the
> moment, my leading candidate is aquaporin AqpZ from E. coli. I am
> planning to express the membrane protein as a GFP fusion so students
> can easily follow it through the course of the labs.
> 
> 
> Thank you,
> Ho
> 
> Ho Leung Ng
> University of Hawaii at Manoa
> Assistant Professor, Department of Chemistry
> h...@hawaii.edu
> 



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Pete Meyer
One thing to keep in mind is that there's usually a trade-off between 
setup (writing and testing) and execution time.  For one-off data 
processing, I'd focus on implementation speed rather than execution 
speed (in other words, FORTRAN might not be ideal unless you're already 
fluent with it).


That said, I'd take a look at python, octave or R.  Python's relatively 
easy to learn, and more flexible than octave/R; but it doesn't have the 
built-in statistic functions that octave and R do.


One other tip which you've probably already though of - Depending on 
your runtimes (I don't think 100s MB of data is usually considered an 
enormous amount, but it'll depend on what you're doing) it may be worth 
getting things working on a small subset of the data first.


Pete

Jacob Keller wrote:

Dear List,

since this probably comes up a lot in manipulation of pdb/reflection files
and so on, I was curious what people thought would be the best language for
the following: I have some huge (100s MB) tables of tab-delimited data on
which I would like to do some math (averaging, sigmas, simple arithmetic,
etc) as well as some sorting and rejecting. It can be done in Excel, but
this is exceedingly slow even in 64-bit, so I am looking to do it through
some scripting. Just as an example, a "sort" which takes >10 min in Excel
takes ~10 sec max with the unix command sort (seems crazy, no?). Any
suggestions?

Thanks, and sorry for being off-topic,

Jacob



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Nat Echols
On Wed, Sep 12, 2012 at 7:32 AM, Jacob Keller
 wrote:
> since this probably comes up a lot in manipulation of pdb/reflection files
> and so on, I was curious what people thought would be the best language for
> the following: I have some huge (100s MB) tables of tab-delimited data on
> which I would like to do some math (averaging, sigmas, simple arithmetic,
> etc) as well as some sorting and rejecting. It can be done in Excel, but
> this is exceedingly slow even in 64-bit, so I am looking to do it through
> some scripting. Just as an example, a "sort" which takes >10 min in Excel
> takes ~10 sec max with the unix command sort (seems crazy, no?). Any
> suggestions?

Anything but Fortran.

Seriously, there are probably a dozen (or more) good solutions, and it
depends on whose syntax you prefer, what external libraries you need,
whether you want to someday apply your new programming skills to
another project, and whether you want anyone else to be able to read
your code.  For me, Python wins easily, but the suggestions of Octave
or R are probably just as good for a one-time script of the sort you
describe.

-Nat


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Carter, Charlie
A similar remark was made to me by David Blow, while he was on sabbatical at 
UNC in the 1980s, working with the UNC Computer Science Department and in a 
moment of intense frustration with the overpowering ignorance of fortran and 
the enthusiasm for Unix exhibited by that department.

Charlie 
On Sep 12, 2012, at 10:58 AM, DUMAS Philippe (UDS) wrote:

> This remembers me a discussion I had with Jorge Navaza, let's say 15 years 
> ago, about the programming language of the future.
> (To a good approximation, 15 years ago, the future was now)
> The answer by Jorge was: "I don't know what it will be, but I know it's name 
> will be FORTRAN".
> I hope he will confirm the statement...
> Philippe Dumas


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Quentin Delettre
I agree with Pete. Moreover, Python doesn't have built-in statistic 
functions but adding package (numpy and scipy in this case) is very simple.


Quentin

Le 12/09/2012 17:11, Pete Meyer a écrit :
One thing to keep in mind is that there's usually a trade-off between 
setup (writing and testing) and execution time.  For one-off data 
processing, I'd focus on implementation speed rather than execution 
speed (in other words, FORTRAN might not be ideal unless you're 
already fluent with it).


That said, I'd take a look at python, octave or R.  Python's 
relatively easy to learn, and more flexible than octave/R; but it 
doesn't have the built-in statistic functions that octave and R do.


One other tip which you've probably already though of - Depending on 
your runtimes (I don't think 100s MB of data is usually considered an 
enormous amount, but it'll depend on what you're doing) it may be 
worth getting things working on a small subset of the data first.


Pete

Jacob Keller wrote:

Dear List,

since this probably comes up a lot in manipulation of pdb/reflection 
files
and so on, I was curious what people thought would be the best 
language for
the following: I have some huge (100s MB) tables of tab-delimited 
data on
which I would like to do some math (averaging, sigmas, simple 
arithmetic,

etc) as well as some sorting and rejecting. It can be done in Excel, but
this is exceedingly slow even in 64-bit, so I am looking to do it 
through
some scripting. Just as an example, a "sort" which takes >10 min in 
Excel

takes ~10 sec max with the unix command sort (seems crazy, no?). Any
suggestions?

Thanks, and sorry for being off-topic,

Jacob






Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Anna Gardberg
Hi Jacob,
As the preceding discussion has illustrated, there are obviously a
number of options, nearly all of which will work well for what you
describe. As Pete Meyer suggested above, the best language may be the
one you already know (I also echo his suggestion to test your code on
a small subset of data first). For my $0.02, I would use Perl, because
I'm already comfortable with it and there is a lot of
readily-modifiable code already freely available. Perl excels at
quickly extracting, reformatting, and reporting data - hence the
backronym, Practical Extraction and Report Language. It's built on C,
so tends to be fast when crunching numbers.

I'm still learning Python, but its functionality seems comparable.
Also, as Nat Echols noted, it's more readable to others, and it seems
to be more fashionable than Perl just now, so will perhaps be more
useful to you in future programming projects.

Good luck!

Best,
Anna

On Wed, Sep 12, 2012 at 8:21 AM, Quentin Delettre  wrote:
> I agree with Pete. Moreover, Python doesn't have built-in statistic
> functions but adding package (numpy and scipy in this case) is very simple.
>
> Quentin
>
> Le 12/09/2012 17:11, Pete Meyer a écrit :
>
>> One thing to keep in mind is that there's usually a trade-off between
>> setup (writing and testing) and execution time.  For one-off data
>> processing, I'd focus on implementation speed rather than execution speed
>> (in other words, FORTRAN might not be ideal unless you're already fluent
>> with it).
>>
>> That said, I'd take a look at python, octave or R.  Python's relatively
>> easy to learn, and more flexible than octave/R; but it doesn't have the
>> built-in statistic functions that octave and R do.
>>
>> One other tip which you've probably already though of - Depending on your
>> runtimes (I don't think 100s MB of data is usually considered an enormous
>> amount, but it'll depend on what you're doing) it may be worth getting
>> things working on a small subset of the data first.
>>
>> Pete
>>
>> Jacob Keller wrote:
>>>
>>> Dear List,
>>>
>>> since this probably comes up a lot in manipulation of pdb/reflection
>>> files
>>> and so on, I was curious what people thought would be the best language
>>> for
>>> the following: I have some huge (100s MB) tables of tab-delimited data on
>>> which I would like to do some math (averaging, sigmas, simple arithmetic,
>>> etc) as well as some sorting and rejecting. It can be done in Excel, but
>>> this is exceedingly slow even in 64-bit, so I am looking to do it through
>>> some scripting. Just as an example, a "sort" which takes >10 min in Excel
>>> takes ~10 sec max with the unix command sort (seems crazy, no?). Any
>>> suggestions?
>>>
>>> Thanks, and sorry for being off-topic,
>>>
>>> Jacob
>>>
>>
>>
>


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Ethan Merritt
On Wednesday, September 12, 2012 07:32:54 am Jacob Keller wrote:
> Dear List,
> 
> since this probably comes up a lot in manipulation of pdb/reflection files
> and so on, I was curious what people thought would be the best language for
> the following: I have some huge (100s MB) tables of tab-delimited data on
> which I would like to do some math (averaging, sigmas, simple arithmetic,
> etc) as well as some sorting and rejecting. It can be done in Excel, but
> this is exceedingly slow even in 64-bit, so I am looking to do it through
> some scripting. Just as an example, a "sort" which takes >10 min in Excel
> takes ~10 sec max with the unix command sort (seems crazy, no?). Any
> suggestions?

For the specific purpose you list -
input from tab-delimited data
output to simple statisitical summaries and (I assume) plots
- it sounds like gnuplot could do the job nicely.

Otherwise I'd recommend perl, and dis-recommend python.

Ethan


-- 
Ethan A Merritt
Biomolecular Structure Center,  K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Jacob Keller
>
> For the specific purpose you list -
> input from tab-delimited data
> output to simple statisitical summaries and (I assume) plots
> - it sounds like gnuplot could do the job nicely.
>

I wasn't aware that gnuplot can do calculations--can it? I was probably
going to use it somewhere as a plotting option.


> Otherwise I'd recommend perl, and dis-recommend python.


Why are you dis-ing python? Seems everybody loves it...

JPK





> Ethan
>
>
> --
> Ethan A Merritt
> Biomolecular Structure Center,  K-428 Health Sciences Bldg
> University of Washington, Seattle 98195-7742
>



-- 
***
Jacob Pearson Keller
Northwestern University
Medical Scientist Training Program
email: j-kell...@northwestern.edu
***


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Edwin Pozharski

All you need is scipy library to get those pesky statistic functions :)

On 09/12/2012 11:11 AM, Pete Meyer wrote:
Python's relatively easy to learn, and more flexible than octave/R; 
but it doesn't have the built-in statistic functions that octave and R 
do. 


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Sabuj Pattanayek
> Why are you dis-ing python? Seems everybody loves it...

Depends on if you like the object model, some don't. In the end it
really boils down to what you're used to and what you've learned to
use.


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Soisson, Stephen M
Now is the time when I start waxing nostalgic about the "old days" when there 
used to be entire threads on this bulletin board about Fortran format statement 
syntax for parsing various files.and I read them with great interest

How did I get to be such a geezer? 

-Original Message-
From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Carter, 
Charlie
Sent: Wednesday, September 12, 2012 11:17 AM
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Off-topic: Best Scripting Language

A similar remark was made to me by David Blow, while he was on sabbatical at 
UNC in the 1980s, working with the UNC Computer Science Department and in a 
moment of intense frustration with the overpowering ignorance of fortran and 
the enthusiasm for Unix exhibited by that department.

Charlie 
On Sep 12, 2012, at 10:58 AM, DUMAS Philippe (UDS) wrote:

> This remembers me a discussion I had with Jorge Navaza, let's say 15 years 
> ago, about the programming language of the future.
> (To a good approximation, 15 years ago, the future was now)
> The answer by Jorge was: "I don't know what it will be, but I know it's name 
> will be FORTRAN".
> I hope he will confirm the statement...
> Philippe Dumas
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.


[ccp4bb] Ligand geometry obs. vs. ideal

2012-09-12 Thread Yuri Pompeu
Hi everyone,
I am trying to show that a ligand underwent catalysis during a soaking 
experiment.
One of the things I would like to show is the geometry of the ligand, bond 
angles/lengths, dihedrals, etc...
One of my models has a hi-res of 1.18A and the ligand density is really clear 
and complete. 
What is the best way to refine the ligand unrestrained and then generate 
measurements?
Also, the idea is to finally compare to ideal geometry. How should I generate 
these values (any softwares in mind)?
ANy idea is welcome.
Thanks a lot


Re: [ccp4bb] Fitting of a trigonal bipyrimidal phosphorus

2012-09-12 Thread Sudipta Bhattacharyya
Dear all,

Thanks for your cooperation.

Regards,
Sudipta.

Sudipta Bhattacharyya,
Senior Research Fellow,
Department of Biotechnology,
Indian Institute of Technology Kharagpur, India.

>
>


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread James Stroud

On Sep 12, 2012, at 9:11 AM, Pete Meyer wrote:

> That said, I'd take a look at python, octave or R.  Python's relatively easy 
> to learn, and more flexible than octave/R; but it doesn't have the built-in 
> statistic functions that octave and R do.


import scipy


Now it does!



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread George Sheldrick
It is the lack of compatibility between different versions mentioned by 
Ethan that really put me off learning PYTHON. In contrast, the 
FORTRAN-66 program SHELX76 still compiles and runs correctly with any 
modern FORTRAN compiler. The only significant 'new' features that I now 
use are dynamic array allocation (introduced in FORTRAN-90) and OpenMP 
support for multiple CPUs, but even programs using OpenMP would still 
work with older compilers because the OpenMP instructions would be 
treated as comments.


George

On 09/12/2012 08:28 PM, Ethan Merritt wrote:

On Wednesday, September 12, 2012 09:52:09 am Jacob Keller wrote:

For the specific purpose you list -
input from tab-delimited data
output to simple statisitical summaries and (I assume) plots
- it sounds like gnuplot could do the job nicely.


I wasn't aware that gnuplot can do calculations--can it? I was probably
going to use it somewhere as a plotting option.

Here's a simple-minded example using a dump of the current contents
of the PDB from www.pdb.org as a comma-separated file with ~65000 entries.
The input file was previously filtered to contain only X-ray structures
between 1 and 4 Angstroms resolution.

gnuplot>  !head -3 PDB.csv
PDB ID,R Observed,R All,R Work,R Free,Refinement Resolution
"100D","0.145","","0.145","","1.90"
"101D","0.163","","","0.252","2.25"

gnuplot>  set datafile separater ","
gnuplot>  set datafile nofpe_trap   # trap handling greatly slows large data 
sets
gnuplot>  stats 'PDB.csv' using "R Observed" prefix "Robs"

* FILE:
   Records:  63029
   Out of range: 0
   Invalid:  0
   Blank:2
   Data Blocks:  2

* COLUMN:
   Mean:  0.1982
   Std Dev:   0.0334
   Sum:   12494.6900
   Sum Sq.:2547.3068

   Minimum:   0.0450 [24518]
   Maximum:   0.9700 [45024]
   Quartile:  0.1770
   Median:0.1970
   Quartile:  0.2180

gnuplot>  print Robs_mean
  0.198237160672072

gnuplot>  #calculate correlation of Robs with Resolution
gnuplot>  stats 'PDB.cvs' using "R Observed":"Refinement Resolution"  nooutput
gnuplot>  print STATS_correlation
  0.595763711910418

I've attached graphical output of the same data following some sorting,
filtered, binning, etc, with output to a PDF file.

You can do all this in R also.   R has a larger collection of statistics 
options,
but is not as good at dealing with really large data sets.  IMHO gnuplot has 
more
flexible options for graphical output.


Otherwise I'd recommend perl, and dis-recommend python.


Why are you dis-ing python? Seems everybody loves it...

I'm sure you can google for many "reasons I hate Python" lists.

Mine would start
1) sensitive to white space == fail
2) dynamic typing makes it nearly impossible to verify program correctness,
and very hard to debug problems that arise from unexpected input or
a mismatch between caller and callee.
3) the language developers don't care about backward compatibility;
it seems version 2.n+1 always breaks code written for version 2.n,
and let's not even talk about version 3
4) slw unless you use it simply as a wrapper for C++,
in which case why not just use C++ or C to begin with?
5) not thread-safe

 you did ask...

Ethan




--
Prof. George M. Sheldrick FRS
Dept. Structural Chemistry,
University of Goettingen,
Tammannstr. 4,
D37077 Goettingen, Germany
Tel. +49-551-39-3021 or -3068
Fax. +49-551-39-22582


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread James Stroud
Python sorting 1 records of 1 floats for each record,  finding the max, 
min, and mean of entire 100,000,000 32 bit float array (400 MB) on a 6 year old 
white imac.

 *11.6 seconds.

*This doesn't include the time to generate the 400 MB of random (normal) data.

Try it on your own computer. Here's the copy-paste from mine:

py> import timeit
py> timeit.timeit('big_data.sort(axis=0), big_data.mean(); big_data.max(); 
big_data.min();',
 'import numpy; big_data=numpy.random.normal(10, 
size=1e8).reshape((1e4,1e4)); print "random data made, starting..."',
 number=1)
random data made, starting...
11.597978115081787

James




On Sep 12, 2012, at 8:32 AM, Jacob Keller wrote:

> Dear List,
> 
> since this probably comes up a lot in manipulation of pdb/reflection files 
> and so on, I was curious what people thought would be the best language for 
> the following: I have some huge (100s MB) tables of tab-delimited data on 
> which I would like to do some math (averaging, sigmas, simple arithmetic, 
> etc) as well as some sorting and rejecting. It can be done in Excel, but this 
> is exceedingly slow even in 64-bit, so I am looking to do it through some 
> scripting. Just as an example, a "sort" which takes >10 min in Excel takes 
> ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions?
> 
> Thanks, and sorry for being off-topic,
> 
> Jacob
> 
> -- 
> ***
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> email: j-kell...@northwestern.edu
> ***



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Pavel Afonine
Hi,

Python, of course (if you know some basic math). Otherwise, Python and a
good math text book -:)

Pavel

On Wed, Sep 12, 2012 at 7:32 AM, Jacob Keller <
j-kell...@fsm.northwestern.edu> wrote:

> Dear List,
>
> since this probably comes up a lot in manipulation of pdb/reflection files
> and so on, I was curious what people thought would be the best language for
> the following: I have some huge (100s MB) tables of tab-delimited data on
> which I would like to do some math (averaging, sigmas, simple arithmetic,
> etc) as well as some sorting and rejecting. It can be done in Excel, but
> this is exceedingly slow even in 64-bit, so I am looking to do it through
> some scripting. Just as an example, a "sort" which takes >10 min in Excel
> takes ~10 sec max with the unix command sort (seems crazy, no?). Any
> suggestions?
>
> Thanks, and sorry for being off-topic,
>
> Jacob
>
> --
> ***
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> email: j-kell...@northwestern.edu
> ***
>


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread James Stroud

On Sep 12, 2012, at 1:00 PM, George Sheldrick wrote:

> It is the lack of compatibility between different versions mentioned by Ethan 
> that really put me off learning PYTHON.


Python is backwards compatible. I have reams of code I wrote in python 2.3 that 
still works in 2.7 without modification.

Also, python (aka python 2) and python 3000 (aka python 3) are considered two 
different languages. It's not reasonable to consider them one language and then 
complain that they are incompatible. Python 3 was created as a new language 
(and should be treated as such) precisely because it breaks compatibility with 
python 2. That was the intent of the language authors.

You blame the authors for recognizing limitations of a language and inventing a 
new one to overcome those limitations.

If the FORTRAN authors would have done that about 30 years ago, we all might be 
programming in FORTRAN.

James



Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Edwin Pozharski

Ethan,

I think majority of your complaints about python result from its very 
purpose - to be readable/portable for the sake of facilitating rapid 
implementation.  There are many other languages that provide tools to 
accomplish what Jacob wants to do (well, I would stay away from P''), 
but python definitely is a good option for casual calculations.


On 09/12/2012 02:28 PM, Ethan Merritt wrote:

I'm sure you can google for many "reasons I hate Python" lists.

Mine would start
1) sensitive to white space == fail
every language has a way to group lines of code.  Curly brackets are 
fine, but python is designed to force code readability, and preceding 
white space (btw, everywhere else it is ignored) does that.



2) dynamic typing makes it nearly impossible to verify program correctness,
and very hard to debug problems that arise from unexpected input or
a mismatch between caller and callee.


While indeed 1/3=0 (but so it will be in C), I think it's a bit of an 
overstatement that python code execution is "nearly impossible to verify".
Another goal of python is to accelerate implementation, and dynamic/duck 
typing supposedly helps that.  The argument is simply that weak typing 
favours strong testing, which should be a good thing.



3) the language developers don't care about backward compatibility;
it seems version 2.n+1 always breaks code written for version 2.n,
and let's not even talk about version 3


I don't think that's entirely true either, why would they then backport 
certain features from v3?  The decision to not provide backward 
compatibility was well explained.  While 2to3 converter may potentially 
fail on complex code, the very fact that it was implemented confirms 
that python developers do care about the issue to some extent. While I 
definitely agree that it is annoying when a module you rely on is 
deprecated, there is a strong argument that a clean break is sometimes 
better than continuous patching of a code that outlived its initial design.



4) slw unless you use it simply as a wrapper for C++,
in which case why not just use C++ or C to begin with?


Native python is not meant for number-crunching, but wrappers such as 
scipy allow one to combine python flexibility/readability with speed of 
compiled binaries.  One reason to use python over C/C++ is portability.



5) not thread-safe
I am definitely not an expert on this (or anything else), but afaiu this 
is not unique to python.


Cheers,

Ed.


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Nat Echols
On Wed, Sep 12, 2012 at 12:49 PM, James Stroud  wrote:
> Also, python (aka python 2) and python 3000 (aka python 3) are considered
> two different languages. It's not reasonable to consider them one language
> and then complain that they are incompatible. Python 3 was created as a new
> language (and should be treated as such) precisely because it breaks
> compatibility with python 2. That was the intent of the language authors.

Actually, despite having endorsed Python, I have to agree with the
complaints about Python 3, for several reasons:

1) It doesn't actually introduce many fundamentally new features that
would have changed how we code for it.  (Like getting rid of "self" or
the Global Interpreter Lock, or writing the interpreter in C++ and
improving the API for writing extensions.)  The only really huge
change is Unicode support, which is probably good but doesn't really
make it a different programming language.
2) The changes that really break code compatibility - like getting rid
of the print statement - seem to have been done on a whim rather than
because of any pressing need.  Maybe this was done to try to force
everyone to migrate immediately (since module developers couldn't
easily maintain code that works with 2.x and 3.x), but it has had the
opposite effect.
3) Development on Python 2 is being shut down.

Despite all this, I would still choose Python over nearly anything
else for scripting (and most other purposes, but eventually C++ will
be necessary too).

> You blame the authors for recognizing limitations of a language and
> inventing a new one to overcome those limitations.
> If the FORTRAN authors would have done that about 30 years ago, we all might
> be programming in FORTRAN.

I think this is what Fortran 90 was supposed to do (unsuccessfully, at
least in the world of crystallography) - but F77 code is still valid
F90 code, just like ANSI C is still valid C++.

-Nat


Re: [ccp4bb] Ligand geometry obs. vs. ideal

2012-09-12 Thread Edwin Pozharski
You can do unrestrained refinement in refmac, at your resolution it may 
be OK.   If you want to keep protein restrained, you can either use 
harmonic restraints or come up with a special cif-file for your ligand 
with large esd targets.  There is no direct way to tell refmac to 
exclude specific residue from restraints, at least to my knowledge.


Cheers,

Ed.

On 09/12/2012 02:44 PM, Yuri Pompeu wrote:

What is the best way to refine the ligand unrestrained and then generate 
measurements?


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread George Reeke
Colleagues:  Another country is heard from:
Since no one has mentioned MATLAB, let me mention it.
--Can easily do any math from 2+2 to matrix SVD etc.
--Statistics toolbox does most of what anyone would want.
--Lots of easy quick graphics that can be prettied up if needed.
--If you know FORTRAN, you already know most of the syntax.
--Reasonably easy to write quickies, can also run large
  calculations quite fast, can even run some stuff on GPUs now.
--Largely, not entirely, backwards compatible to earlier versions.
--Available for Linux, MAC, Windows.
--Great tech support and online help.
Disadvantages:
--It is not free.
--Not so great for text manipulation operations.
--Takes a while to learn, so not good for one quickie, but well
  worth the time to learn it in the long run.

George Reeke


[ccp4bb] Aimless and Pointless

2012-09-12 Thread Cosmo Z Buffalo
Hi all,

I am currently trying to perform a quickscale in iMosflm 7.0.9 after I 
integrate in an R 32 space group.  Unfortunately, both Pointless and Aimless 
are both giving me a best solution space group of P 43 3 2.  After analyzing 
the statistics, this cannot be correct.  Other programs such as HKL2000 have 
confirmed this to be true.  So my question: is it possible to force Aimless and 
Pointless to generate statistics in a space group other than the one it 
predicts?  And if so, how would I do this?

-Cosmo


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread Ho Leung Ng
 I encourage trainees to learn a programming language that they
will help their careers beyond their short time in my lab. Many or
most of them will not continue in structural biology or even science.
For the moment, I am pushing python even though I am minimally
literate in it myself. They should learn a "modern" programming
language that is widely used beyond my subdiscipline. Python will
probably help them get a job more than Fortran.


Ho


Re: [ccp4bb] Aimless and Pointless

2012-09-12 Thread Ed Pozharski

On 09/12/2012 06:41 PM, Cosmo Z Buffalo wrote:
is it possible to force Aimless and Pointless to generate statistics 
in a space group other than the one it predicts? 

yes, but it's pointless to force Pointless


And if so, how would I do this?


I assume you mean doing it from imosflm.  If so, go Settings->Processing 
options->Advanced Integration and select the "Use iMosflm symmetry in 
QuickScale" checkbox


--
Oh, suddenly throwing a giraffe into a volcano to make water is crazy?
Julian, King of Lemurs


Re: [ccp4bb] Aimless and Pointless

2012-09-12 Thread Ethan Merritt
On Wednesday, 12 September 2012, Cosmo Z Buffalo wrote:
> Hi all,
> 
> I am currently trying to perform a quickscale in iMosflm 7.0.9 after I 
> integrate in an R 32 space group.  Unfortunately, both Pointless and Aimless 
> are both giving me a best solution space group of P 43 3 2.  After analyzing 
> the statistics, this cannot be correct.  Other programs such as HKL2000 have 
> confirmed this to be true.  So my question: is it possible to force Aimless 
> and Pointless to generate statistics in a space group other than the one it 
> predicts?  And if so, how would I do this?

Pointless is usually very good at detecting additional symmetry elements.

So I'm idly curious - what does Pointless report as the correlation
coefficient and R-merge for the extra symmetry elements in P 43 3 2?
Could you show us the whole symmetry table?

Have you gotten far enough to see if their are NCS copies whose
positioning mimics the cubic symmetry?

Ethan

> 
> -Cosmo
> 


Re: [ccp4bb] Off-topic: Best Scripting Language

2012-09-12 Thread William G. Scott
I'd just use a decent shell scripting language (like zsh) in conjunction with a 
unix tool like awk.  But the gnuplot option sounds ideal.

Bill


William G. Scott
Professor
Department of Chemistry and Biochemistry
and The Center for the Molecular Biology of RNA
228 Sinsheimer Laboratories
University of California at Santa Cruz
Santa Cruz, California 95064
USA


On Sep 12, 2012, at 7:32 AM, Jacob Keller  
wrote:

> Dear List,
> 
> since this probably comes up a lot in manipulation of pdb/reflection files 
> and so on, I was curious what people thought would be the best language for 
> the following: I have some huge (100s MB) tables of tab-delimited data on 
> which I would like to do some math (averaging, sigmas, simple arithmetic, 
> etc) as well as some sorting and rejecting. It can be done in Excel, but this 
> is exceedingly slow even in 64-bit, so I am looking to do it through some 
> scripting. Just as an example, a "sort" which takes >10 min in Excel takes 
> ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions?
> 
> Thanks, and sorry for being off-topic,
> 
> Jacob
> 
> -- 
> ***
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> email: j-kell...@northwestern.edu
> ***


Re: [ccp4bb] Ligand geometry obs. vs. ideal

2012-09-12 Thread Robert Nicholls
In case it helps… After you've done unrestrained refinement, you can use 
prosmart to generate external self-restraints to the current conformation 
(using the -self_restrain keyword). This is flexible - you can specify residue 
ranges, and it works for protein, ligand, DNA/RNA, waters, etc. These external 
restraints will attempt to maintain the original relative conformation 
throughout refinement. If you want any help doing this, feel free to email me 
off-board.

Cheers,
Rob


On 12 Sep 2012, at 21:11, Edwin Pozharski wrote:

> You can do unrestrained refinement in refmac, at your resolution it may be 
> OK.   If you want to keep protein restrained, you can either use harmonic 
> restraints or come up with a special cif-file for your ligand with large esd 
> targets.  There is no direct way to tell refmac to exclude specific residue 
> from restraints, at least to my knowledge.
> 
> Cheers,
> 
> Ed.
> 
> On 09/12/2012 02:44 PM, Yuri Pompeu wrote:
>> What is the best way to refine the ligand unrestrained and then generate 
>> measurements?