Re: [Tutor] Concatenating multiple lines into one

2012-02-12 Thread Spyros Charonis
Thanks for all the help, Peter's and Hugo's methods worked well in
concatenating multiple lines into a single data structure!

S

On Fri, Feb 10, 2012 at 5:30 PM, Mark Lawrence wrote:

> On 10/02/2012 17:08, Peter Otten wrote:
>
>> Spyros Charonis wrote:
>>
>>  Dear python community,
>>>
>>> I have a file where I store sequences that each have a header. The
>>> structure of the file is as such:
>>>
>>>  sp|(some code) =>1st header

>>> AGGCGG
>>> MNKPLOI
>>> .
>>> .
>>>
>>>  sp|(some code) =>  2nd header

>>> AA
>>>  ...
>>> .
>>>
>>> ..
>>>
>>> I am looking to implement a logical structure that would allow me to
>>> group
>>> each of the sequences (spread on multiple lines) into a single string. So
>>> instead of having the letters spread on multiple lines I would be able to
>>> have 'AGGCGGMNKP' as a single string that could be indexed.
>>>
>>> This snipped is good for isolating the sequences (=stripping headers and
>>> skipping blank lines) but how could I concatenate each sequence in order
>>> to get one string per sequence?
>>>
>>>  for line in align_file:
>>
> ... if line.startswith('>sp'):
>>> ... continue
>>> ... elif not line.strip():
>>> ... continue
>>> ... else:
>>> ... print line
>>>
>>> (... is just OS X terminal notation, nothing programmatic)
>>>
>>> Many thanks in advance.
>>>
>>
>> Instead of printing the line directly collect it in a list (without
>> trailing
>> "\n"). When you encounter a line starting with">sp" check if that list is
>> non-empty, and if so print "".join(parts), assuming the list is called
>> parts, and start with a fresh list. Don't forget to print any leftover
>> data
>> in the list once the for loop has terminated.
>>
>> __**_
>> Tutor maillist  -  Tutor@python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/**mailman/listinfo/tutor
>>
>>
> The advice from Peter is sound if the strings could grow very large but
> you can simply concatenate the parts if they are not.  For the indexing
> simply store your data in a dict.
>
> --
> Cheers.
>
> Mark Lawrence.
>
>
> __**_
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/**mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Concatenating multiple lines into one

2012-02-10 Thread Mark Lawrence

On 10/02/2012 17:08, Peter Otten wrote:

Spyros Charonis wrote:


Dear python community,

I have a file where I store sequences that each have a header. The
structure of the file is as such:


sp|(some code) =>1st header

AGGCGG
MNKPLOI
.
.


sp|(some code) =>  2nd header

AA
 ...
.

..

I am looking to implement a logical structure that would allow me to group
each of the sequences (spread on multiple lines) into a single string. So
instead of having the letters spread on multiple lines I would be able to
have 'AGGCGGMNKP' as a single string that could be indexed.

This snipped is good for isolating the sequences (=stripping headers and
skipping blank lines) but how could I concatenate each sequence in order
to get one string per sequence?


for line in align_file:

... if line.startswith('>sp'):
... continue
... elif not line.strip():
... continue
... else:
... print line

(... is just OS X terminal notation, nothing programmatic)

Many thanks in advance.


Instead of printing the line directly collect it in a list (without trailing
"\n"). When you encounter a line starting with">sp" check if that list is
non-empty, and if so print "".join(parts), assuming the list is called
parts, and start with a fresh list. Don't forget to print any leftover data
in the list once the for loop has terminated.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor



The advice from Peter is sound if the strings could grow very large but 
you can simply concatenate the parts if they are not.  For the indexing 
simply store your data in a dict.


--
Cheers.

Mark Lawrence.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Concatenating multiple lines into one

2012-02-10 Thread Peter Otten
Spyros Charonis wrote:

> Dear python community,
> 
> I have a file where I store sequences that each have a header. The
> structure of the file is as such:
> 
>>sp|(some code) =>1st header
> AGGCGG
> MNKPLOI
> .
> .
> 
>>sp|(some code) => 2nd header
> AA
>  ...
> .
> 
> ..
> 
> I am looking to implement a logical structure that would allow me to group
> each of the sequences (spread on multiple lines) into a single string. So
> instead of having the letters spread on multiple lines I would be able to
> have 'AGGCGGMNKP' as a single string that could be indexed.
> 
> This snipped is good for isolating the sequences (=stripping headers and
> skipping blank lines) but how could I concatenate each sequence in order
> to get one string per sequence?
> 
 for line in align_file:
> ... if line.startswith('>sp'):
> ... continue
> ... elif not line.strip():
> ... continue
> ... else:
> ... print line
> 
> (... is just OS X terminal notation, nothing programmatic)
> 
> Many thanks in advance.

Instead of printing the line directly collect it in a list (without trailing 
"\n"). When you encounter a line starting with ">sp" check if that list is 
non-empty, and if so print "".join(parts), assuming the list is called 
parts, and start with a fresh list. Don't forget to print any leftover data 
in the list once the for loop has terminated.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Concatenating multiple lines into one

2012-02-10 Thread Hugo Arts
On Fri, Feb 10, 2012 at 5:38 PM, Spyros Charonis  wrote:
> Dear python community,
>
> I have a file where I store sequences that each have a header. The structure
> of the file is as such:
>
>>sp|(some code) =>1st header
> AGGCGG
> MNKPLOI
> .
> .
>
>>sp|(some code) => 2nd header
> AA
>  ...
> .
>
> ..
>
> I am looking to implement a logical structure that would allow me to group
> each of the sequences (spread on multiple lines) into a single string. So
> instead of having the letters spread on multiple lines I would be able to
> have 'AGGCGGMNKP' as a single string that could be indexed.
>
> This snipped is good for isolating the sequences (=stripping headers and
> skipping blank lines) but how could I concatenate each sequence in order to
> get one string per sequence?
>
 for line in align_file:
> ...     if line.startswith('>sp'):
> ...             continue
> ...     elif not line.strip():
> ...             continue
> ...     else:
> ...             print line
>
> (... is just OS X terminal notation, nothing programmatic)
>
> Many thanks in advance.
>
> S.
>

python has a simple method to do that, str.join. Let me demonstrate it:

>>> a = ['a', 'b', 'c', 'd', 'e']
>>> ''.join(a)
'abcde'
>>> ' '.join(a) # with a space
'a b c d e'
>>> ' hello '.join(a) # go crazy if you want
'a hello b hello c hello d hello e'

so, it takes a list as an argument and joins the elements together in
a string. the string that you call join on is used as a separator
between the arguments. Pretty simple.

HTH,
Hugo
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Concatenating multiple lines into one

2012-02-10 Thread Spyros Charonis
Dear python community,

I have a file where I store sequences that each have a header. The
structure of the file is as such:

>sp|(some code) =>1st header
AGGCGG
MNKPLOI
.
.

>sp|(some code) => 2nd header
AA
 ...
.

..

I am looking to implement a logical structure that would allow me to group
each of the sequences (spread on multiple lines) into a single string. So
instead of having the letters spread on multiple lines I would be able to
have 'AGGCGGMNKP' as a single string that could be indexed.

This snipped is good for isolating the sequences (=stripping headers and
skipping blank lines) but how could I concatenate each sequence in order to
get one string per sequence?

>>> for line in align_file:
... if line.startswith('>sp'):
... continue
... elif not line.strip():
... continue
... else:
... print line

(... is just OS X terminal notation, nothing programmatic)

Many thanks in advance.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor