Re: [PyMOL] Shell utilities for structural bioinformatics

Tsjerk Wassenaar Fri, 12 Sep 2014 03:13:51 -0700

Hi James,

This is more text-file processing than it is bioinformatics. The trick is
to understand the problem, dissect it, and fit it to your toolbox on Linux.
That's actually much of bioinformatics :)


The first thing to understand is what data you have and what data you need
to have in the end. That will determine the tools and how to use them. To
extract the first part of a file, up to and including a tag (ENDMDL), you
could use sed:

sed /^ENDMDL/q models.pdb > firstmodel.pdb

While at it, you can also delete those lines you don't want to:

sed -e /^ROOT/d -e /^ENDROOT/d -e /^TORSDOF/d -e /^ENDMDL/q models.pdb >
firstmodel.pdb

For bioinformatics, it really pays off to read up on sed and awk.

As for the other question, yes, csplit can be used to extract one, or a
number of blocks. The {*} indicates that all blocks are to be written. {10}
indicates the first ten blocks are to be written. Check the help to see how
to use csplit to extract a specific block. I just read up on it now to be
able to answer your question. I didn't know this about csplit when I woke
up this morning.

Cheers,

Tsjerk

On Fri, Sep 12, 2014 at 12:00 PM, James Starlight <[email protected]>
wrote:

> Hi Tsjerk,
>
> thank you very much for help.
>
> this is a little bioinformatics question so probably it's better to ask it
> here some expert of this topic like you :)
>
> because in my case I need to further proceed each split model model (e,g
> delete some lines or make changing) piping with some commands
>
> e,g in my case each model after spliting consist of
>
> MODEL 1
> ROOT
> ATOMS
> ENDROOT
> TORSDOF 0
> ENDMDL
>
> i'd like to remove lines consisted of ROOT ENDROOT TORSDOF 0 and change
> ENDMDL to TER
>
> i've tried to do it
>
> csplit -b "%04d.pdb" my_docking.pdb /^MODEL/ {*} | grep -v '^ENDROOT' |
> grep -v '^TORSDOF 0' |  sed -e 's/^ENDMDL/TER/g'
>
> but the resulted files still consist of unused lines
>
> BTW might the csplit be used to extract only ONE (e,g first) model from
> the multi-pdb file?
>
> James
>
> 2014-09-12 11:39 GMT+02:00 Tsjerk Wassenaar <[email protected]>:
>
>> Hi James,
>>
>> These are the sort of questions that'll be answered elsewhere. Most
>> notably on stackoverflow:
>> http://stackoverflow.com/questions/18364411/using-regex-to-tell-csplit-where-to-split-the-file
>>
>> csplit -b "%04d.pdb" file.pdb /^MODEL/ {*}
>>
>> Cheers,
>>
>> Tsjerk
>>
>>
>> On Fri, Sep 12, 2014 at 11:25 AM, James Starlight <[email protected]
>> > wrote:
>>
>>> Hi,
>>>
>>> some new question.
>>>
>>> I need to some combination of shell utilities to split multi_model.pdb
>>> on several pdbs  as well as separate command to seek multi_model.pdb and to
>>> save only this model as the separare model1.pdb. I've tried to do it using
>>> grep
>>> grep '^MODEL 1' my_docking.pdb > model1.pdb
>>>
>>> but results were empty.
>>>
>>> James
>>>
>>> 2014-09-08 15:48 GMT+02:00 James Starlight <[email protected]>:
>>>
>>>> Thanks you very much!
>>>>
>>>> James
>>>>
>>>> 2014-09-05 20:18 GMT+02:00 Folmer Fredslund <[email protected]>:
>>>>
>>>>> Hi
>>>>>
>>>>> Small correction to Gianlucas suggestion
>>>>>
>>>>> ">" will direct the output to a file, overwriting the contents
>>>>> ">>" will direct the output to a file, appending the contents
>>>>>
>>>>> Venlig hilsen
>>>>> Folmer Fredslund
>>>>> Den 05/09/2014 19.16 skrev "Gianluca Santoni" <[email protected]
>>>>> >:
>>>>>
>>>>> Don't even need cat
>>>>>> simply do
>>>>>>
>>>>>> grep PPC ref.pdb > tar_i.pdb
>>>>>>
>>>>>> redirecting std out with > appends it directly to the file (after the
>>>>>> last line)
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> On 9/5/14 6:48 PM, James Starlight wrote:
>>>>>> > Dear Pymol users!
>>>>>> >
>>>>>> > I've decided to open new topic focused on the implementation of the
>>>>>> > common shell utilities like grep awk and sed for the structural
>>>>>> > bioinformatics tasks like processing and editing of the large sets
>>>>>> of pdbs.
>>>>>> >
>>>>>> > In my current task I need to copy all lipids from one pdb (called it
>>>>>> > ref) to another call it tar_i.pdb (both files have the same 3D
>>>>>> shape and
>>>>>> > have been superimposed before that): so in that case I guess lipids
>>>>>> > could be recognized by residue name in pdb file (PPC) as well as by
>>>>>> its
>>>>>> > #4 column number (what is actually do grep).  So the algorithm
>>>>>> might be:
>>>>>> > select from the ref.pdb all strings where #4 column is PPC and
>>>>>> merge it
>>>>>> > (by means of CAT I guess) with the tar_i.pdb. Please show me some
>>>>>> > example of the one-line method of this realization.
>>>>>> >
>>>>>> > Thanks,
>>>>>> >
>>>>>> > James
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> ------------------------------------------------------------------------------
>>>>>> > Slashdot TV.
>>>>>> > Video for Nerds.  Stuff that matters.
>>>>>> > http://tv.slashdot.org/
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > PyMOL-users mailing list ([email protected])
>>>>>> > Info Page: https://lists.sourceforge.net/lists/listinfo/pymol-users
>>>>>> > Archives:
>>>>>> http://www.mail-archive.com/[email protected]
>>>>>> >
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gianluca Santoni,
>>>>>> Dynamop Group
>>>>>> Institut de Biologie Structurale
>>>>>> 6 rue Jules Horowitz
>>>>>> 38027 Grenoble Cedex 1
>>>>>> France
>>>>>> _________________________________________________________
>>>>>> Please avoid sending me Word or PowerPoint attachments.
>>>>>> See http://www.gnu.org/philosophy/no-word-attachments.html
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Slashdot TV.
>>>>>> Video for Nerds.  Stuff that matters.
>>>>>> http://tv.slashdot.org/
>>>>>> _______________________________________________
>>>>>> PyMOL-users mailing list ([email protected])
>>>>>> Info Page: https://lists.sourceforge.net/lists/listinfo/pymol-users
>>>>>> Archives:
>>>>>> http://www.mail-archive.com/[email protected]
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Slashdot TV.
>>>>> Video for Nerds.  Stuff that matters.
>>>>> http://tv.slashdot.org/
>>>>> _______________________________________________
>>>>> PyMOL-users mailing list ([email protected])
>>>>> Info Page: https://lists.sourceforge.net/lists/listinfo/pymol-users
>>>>> Archives:
>>>>> http://www.mail-archive.com/[email protected]
>>>>>
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Want excitement?
>>> Manually upgrade your production database.
>>> When you want reliability, choose Perforce
>>> Perforce version control. Predictably reliable.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> PyMOL-users mailing list ([email protected])
>>> Info Page: https://lists.sourceforge.net/lists/listinfo/pymol-users
>>> Archives: http://www.mail-archive.com/[email protected]
>>>
>>
>>
>>
>> --
>> Tsjerk A. Wassenaar, Ph.D.
>>
>>
>


-- 
Tsjerk A. Wassenaar, Ph.D.

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________
PyMOL-users mailing list ([email protected])
Info Page: https://lists.sourceforge.net/lists/listinfo/pymol-users
Archives: http://www.mail-archive.com/[email protected]

Re: [PyMOL] Shell utilities for structural bioinformatics

Reply via email to