[Tutor] Understanding a linear runtime implementation of anagram detection

2015-12-10 Thread Spyros Charonis
Dear All,

I am learning about analysis of algorithms (python 2.7.6). I am reading a
book (Problem solving with Algorithms and Data Structures) where Python is
the language used for implementations. The author introduces algorithm
analysis in a clear and understandable way, and uses an anagram detection
program as a template to compare different runtime implementations
(quadratic, log linear, linear). In the linear, and most efficient
implementation, the code is as follows (comments added by me):

def anagram_test2(s1,s2):""" Checks if two strings are anagrams of each other
Runs with O(n) linear complexity """
if (not s1) or (not s2):
raise TypeError, "Invalid input: input must be string"
return None
# Initialize two lists of counters
c1 = [0] * 26
c2 = [0] * 26
# Iterate over each string# When a char is encountered, # increment
the counter at # its correspoding position   for i in range(len(s1)):
pos = ord(s1[i]) - ord("a")
c1[pos] += 1
for i in range(len(s2)):
pos = ord(s2[i]) - ord("a")
c2[pos] += 1

j = 0
hit = Truewhile j < 26 and hit:
if c1[j] == c2[j]:
j += 1
else:
hit = False
return hit


My questions are:

1)
Is it computationally more/less/equally efficient to use an explicit while
loop as it is to just do "return c1 === c2" (replacing the final code block
following the two for loops). I realize that this single line of code
performs an implicit for loop over each index to test for equality. My
guess is that because in other languages you may not be able to do this
simple test, the author wanted to present an example that could be adapted
for other languages, unless the explicit while loop is less expensive
computationally.

2)
How could I go about adapting this algorithm for multiple strings (say I
had 20 strings and wanted to check if they are anagrams of one another).

def are_anagrams(*args):

""" Accepts a tuple of strings and checks if

 they are anagrams of each other """


 # Check that neither of strings are null

 for i in args:

 if not i:

 raise TypeError, "Invalid input"

 return None



 # Initialize a list of counters for each string

 c = ( [] for i in range(len(args) ) ???

Many thanks in advance!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] bubble sort function

2014-11-16 Thread Spyros Charonis
Many thanks for the link as well as for the pseudocode  code. I see what I
did wrong now. Here's the final version that works:


def bubbleSort_ascending(unsorted):

 Sorts a list of numbers in ascending order 

n = len(unsorted)

count = swaps = 0

swapped = True

## Prompt user to choose if they want to see each sorting step

option = raw_input(Show sorting steps? (Y/N):\n)

while swapped:

count += 1

swapped = False

## Use a tuple assignment in order to swap the value of two
variables

for i in range(1, n):

if unsorted[i-1]  unsorted[i]:

unsorted[i-1], unsorted[i] = unsorted[i], unsorted[i-1]

swapped = True

## Catch user input and either show or hide sorting steps
accordingly

if option in (Y, y):

print \nIteration %d, %d swaps; list: %r\n %(count, swaps,
unsorted)

elif option in (N, n):

pass

else:

print \nYour input was invalid, type either Y/y or N/n

return unsorted

On Sun, Nov 16, 2014 at 4:50 AM, Steven D'Aprano st...@pearwood.info
wrote:

 On Sat, Nov 15, 2014 at 04:46:26PM +, Spyros Charonis wrote:
  Dear group,
 
 
  I'm having a bit of trouble with understanding why my bubble sort
  implementation doesn't work. I've got the following function to perform a
  bubble sort operation on a list of numbers:

 It doesn't work because it is completely wrong. Sorry to be harsh, but
 sometimes it is easier to throw broken code away and start again than it
 is to try to diagnose the problems with it.

 Let's start with the unoptimized version of bubblesort given by
 Wikipedia:

 https://en.wikipedia.org/wiki/Bubble_sort#Implementation

 procedure bubbleSort( A : list of sortable items )
n = length(A)
repeat
  swapped = false
  for i = 1 to n-1 inclusive do
/* if this pair is out of order */
if A[i-1]  A[i] then
  /* swap them and remember something changed */
  swap( A[i-1], A[i] )
  swapped = true
end if
  end for
until not swapped
 end procedure


 Let's translate that to Python:

 def bubbleSort(alist):
 n = len(alist)
 swapped = True
 while swapped:
 swapped = False
 for i in range (1, n-1):
 # if this pair is out of order
 if alist[i-1]  alist[i]:
 # swap them and remember something changed
 alist[i-1], alist[i] = alist[i], alist[i-1]
 swapped = True


 Let's add something to print the partially sorted list each time we go
 through the loop:


 def bubbleSort(alist):
 print(Unsorted: %r % alist)
 n = len(alist)
 swapped = True
 count = swaps = 0
 while swapped:
 count += 1
 swapped = False
 for i in range (1, n):
 # if this pair is out of order
 if alist[i-1]  alist[i]:
 # swap them and remember something changed
 swaps += 1
 alist[i-1], alist[i] = alist[i], alist[i-1]
 swapped = True
 print(Iteration %d, %d swaps; list: %r % (count, swaps, alist))



 And now let's try it:

 py mylist = [2, 4, 6, 8, 1, 3, 5, 7, 9, 0]
 py bubbleSort(mylist)
 Unsorted: [2, 4, 6, 8, 1, 3, 5, 7, 9, 0]
 Iteration 1, 5 swaps; list: [2, 4, 6, 1, 3, 5, 7, 8, 0, 9]
 Iteration 2, 9 swaps; list: [2, 4, 1, 3, 5, 6, 7, 0, 8, 9]
 Iteration 3, 12 swaps; list: [2, 1, 3, 4, 5, 6, 0, 7, 8, 9]
 Iteration 4, 14 swaps; list: [1, 2, 3, 4, 5, 0, 6, 7, 8, 9]
 Iteration 5, 15 swaps; list: [1, 2, 3, 4, 0, 5, 6, 7, 8, 9]
 Iteration 6, 16 swaps; list: [1, 2, 3, 0, 4, 5, 6, 7, 8, 9]
 Iteration 7, 17 swaps; list: [1, 2, 0, 3, 4, 5, 6, 7, 8, 9]
 Iteration 8, 18 swaps; list: [1, 0, 2, 3, 4, 5, 6, 7, 8, 9]
 Iteration 9, 19 swaps; list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
 Iteration 10, 19 swaps; list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]



 Now you can inspect the working code and compare it to the non-working
 code below and see what is different:


  def bubble_sort_ascending(unsorted):
 Sorts a list of numbers into ascending order 
 iterations = 0
  size = len(unsorted) - int(1)
 for i in range(0, size):
  unsorted[i] = float(unsorted[i])
  while unsorted[i]  unsorted[i+1]:
# Use a tuple assignment in order to swap the value of
  two variables
unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i]
iterations += 1
sorted_vec = unsorted[:] # copy unsorted which is now
  sorted
print \nIterations completed: %s\n %(iterations)
 return sorted_vec



 --
 Steven
 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist

[Tutor] bubble sort function

2014-11-15 Thread Spyros Charonis
Dear group,


I'm having a bit of trouble with understanding why my bubble sort
implementation doesn't work. I've got the following function to perform a
bubble sort operation on a list of numbers:


def bubble_sort_ascending(unsorted):

   Sorts a list of numbers into ascending order 

   iterations = 0

   size = len(unsorted) - int(1)

   for i in range(0, size):

unsorted[i] = float(unsorted[i])

while unsorted[i]  unsorted[i+1]:

  # Use a tuple assignment in order to swap the value of
two variables

  unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i]

  iterations += 1

  sorted_vec = unsorted[:] # copy unsorted which is now
sorted

  print \nIterations completed: %s\n %(iterations)

   return sorted_vec


Example: mylist = [4, 1, 7, 19, 13, 22, 17, 14, 23, 21]


When I call it as such bubble_sort_ascending(mylist), it returns the list
only partially sorted with 5 iterations reported, i.e.


[1, 4.0, 7.0, 13, 19.0, 17, 14, 22.0, 21, 23.0]


and I have to call it again for the the sorting operation to complete. Is
there something I am missing in my code? Why does it not sort the entire
list at once and just count all completed iterations?


Any help appreciated.


Many thanks,

Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] bubble sort function

2014-11-15 Thread Spyros Charonis
Thank you Alan,

When I initiated the loop with the condition:

for i in range(len(unsorted)):


Python raised an IndexError saying I had gone out of bounds. Hence the
change to:

for i in range(0, size)


Yes, I actually the loop only consists of:


while unsorted[i]  unsorted[i+1]:

# Use a tuple assignment in order to swap the value of two variables

unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i]

iterations += 1


Sorry about that. the *iterations* update and sorted_vec assignment are
outside of the loop body.


This is indeed just a learning exercise, I am aware that lists have sort()
and reverse() methods. I'm in the process of learning a bit about data
structures  algorithms using Python as my implementation language.




On Sat, Nov 15, 2014 at 7:02 PM, Alan Gauld alan.ga...@btinternet.com
wrote:

 On 15/11/14 16:46, Spyros Charonis wrote:

  def bubble_sort_ascending(unsorted):
 iterations = 0
 size = len(unsorted) - int(1)


 Don't convert 1 to an int - it already is.

  for i in range(0, size):


 This will result in 'i' going from zero to len()-2.
 Is that what you want?

   unsorted[i] = float(unsorted[i])


 Comparing ints to floats or even comparing two floats
 is notoriously error prone due to the imprecision of
 floating point representation. You probably don't want
 to do the conversion.

 And if you must do it, why do you only do it once,
 outside the while loop?

   while unsorted[i]  unsorted[i+1]:
unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i]
iterations += 1


 I assume you intended to end the loop body here?
 But the following lines are indented so are included
 in the loop.

 Also because you never change 'i' the loop can only
 ever run once. So really you could use a an if
 statement instead of the while loop?

 Finally, iterations is really counting swaps. Is that what you want it to
 count or os it actually loop iterations? If so which? The for loop or the
 while loop or the sum of both?

 sorted_vec = unsorted[:]
print \nIterations completed: %s\n %(iterations)
 return sorted_vec


 Since you never alter sorted_vec there is no point in creating it.
 Just return unsorted - which is now sorted...


  and I have to call it again for the the sorting operation to complete.
 Is there something I am missing in my code? Why does it not sort the
 entire list at once and just count all completed iterations?


 There are several things missing or broken, the few I've pointed
 out above will help but the algorithm seems suspect to me. You need
 to revisit the core algorithm I suspect.

 BTW I assume this is just a learning exercise since the default
 sorting algorithm will virtually always be better than bubble
 sort for any real work!

 --
 Alan G
 Author of the Learn to Program web site
 http://www.alan-g.me.uk/
 http://www.flickr.com/photos/alangauldphotos


 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Arbitrary-argument set function

2013-10-01 Thread Spyros Charonis
Dear Pythoners,


I am trying to extract from a set of about 20 sequences, the characters
which are unique to each sequence. For simplicity, imagine I have only 3
sequences (words in this example) such as:


s1='spam'; s2='scam', s3='slam'


I would like the character that is unique to each sequence, i.e. I need my
function to return the list [ 'p', 'c', ',l' ]. This function I am using is
as follows:


def uniq(*args):

 FIND UNIQUE ELEMENTS OF AN ARBITRARY NUMBER OF SEQUENCES

unique = []

for i in args[0]:

if i not in args[1:]:

   unique.append(i)

return unique


and is returning the list [ 's', 'p', 'a', 'm' ]. Any help much appreciated,


Best,

Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Text Processing Query

2013-03-14 Thread Spyros Charonis
Hello Pythoners,

I am trying to extract certain fields from a file that whose text looks
like this:

COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;

COMPND   3 CHAIN: A, B;

COMPND  10 MOL_ID: 2;

COMPND  11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;

COMPND  12 CHAIN: D, F;

COMPND  13 ENGINEERED: YES;

COMPND  14 MOL_ID: 3;

COMPND  15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;

COMPND  16 CHAIN: E, G;

I would like the chain IDs, but only those following the text heading
ANTIBODY FAB FRAGMENT, i.e. I need to create a list with D,F,E,G  which
excludes A,B which have a non-antibody text heading. I am using the
following syntax:

with open(filename) as file:

scanfile=file.readlines()

for line in scanfile:

if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue

elif line[0:6]=='COMPND' and 'CHAIN' in line:

print line

But this yields:

COMPND   3 CHAIN: A, B;

COMPND  12 CHAIN: D, F;

COMPND  16 CHAIN: E, G;

I would like to ignore the first line since A,B correspond to non-antibody
text headings, and instead want to extract only D,F  E,G whose text
headings are specified as antibody fragments.

Many thanks,
Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Text Processing Query

2013-03-14 Thread Spyros Charonis
Yes, the elif line need to have **flag_FAB ==1** as is conidition instead
of **flag_FAB=1**. So:


for line in scanfile:

if line[0:6]=='COMPND' and 'FAB' in line: flag_FAB = 1

elif line[0:6]=='COMPND' and 'CHAIN' in line and flag_FAB == 1:

print line

flag_FAB = 0


On Thu, Mar 14, 2013 at 4:33 PM, Mark Lawrence breamore...@yahoo.co.ukwrote:

 On 14/03/2013 11:28, taserian wrote:

 Top posting fixed


 On Thu, Mar 14, 2013 at 6:56 AM, Spyros Charonis s.charo...@gmail.com
 mailto:s.charo...@gmail.com wrote:

 Hello Pythoners,

 I am trying to extract certain fields from a file that whose text
 looks like this:

 COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;
 COMPND   3 CHAIN: A, B;
 COMPND  10 MOL_ID: 2;
 COMPND  11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;
 COMPND  12 CHAIN: D, F;
 COMPND  13 ENGINEERED: YES;
 COMPND  14 MOL_ID: 3;
 COMPND  15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;
 COMPND  16 CHAIN: E, G;

 I would like the chain IDs, but only those following the text
 heading ANTIBODY FAB FRAGMENT, i.e. I need to create a list with
 D,F,E,G  which excludes A,B which have a non-antibody text heading.
 I am using the following syntax:

 with open(filename) as file:

  scanfile=file.readlines()

  for line in scanfile:

  if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue

  elif line[0:6]=='COMPND' and 'CHAIN' in line:

  print line


 But this yields:

 COMPND   3 CHAIN: A, B;
 COMPND  12 CHAIN: D, F;
 COMPND  16 CHAIN: E, G;

 I would like to ignore the first line since A,B correspond to
 non-antibody text headings, and instead want to extract only D,F 
 E,G whose text headings are specified as antibody fragments.

 Many thanks,
 Spyros

 Since the identifier and the item that you want to keep are on different
 lines, you'll need to set a flag.

 with open(filename) as file:

  scanfile=file.readlines()

  flag = 0

  for line in scanfile:

  if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: flag = 1

  elif line[0:6]=='COMPND' and 'CHAIN' in line and flag = 1:

  print line

  flag = 0


 Notice that the flag is set to 1 only on FAB FRAGMENT, and it's reset
 to 0 after the next CHAIN line that follows the FAB FRAGMENT line.


 AR



 Notice that this code won't run due to a syntax error.

 --
 Cheers.

 Mark Lawrence


 __**_
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle.dump yielding awkward output

2013-02-04 Thread Spyros Charonis
Thank you Alan, Steven,

I don't care about the characters from the pickle operation per se, I just
want the list to be stored in its native format.

What I am trying to do is basically the Unix shell equivalent of: Unix
command  newfile.txt

I am trying to store the list that I get from my code in a separate file,
in human-readable format.


On Mon, Feb 4, 2013 at 1:03 AM, Alan Gauld alan.ga...@btinternet.comwrote:

 On 03/02/13 19:26, Spyros Charonis wrote:

 I am experiencing a strange result with the pickle module when using it
 to write certain results to a separate file.


 The only strangec results using pickle would be if the uinpickle failed to
 bring back that which was pickled.
 Pickle is a storage format not a display format.


  In short, I have a program that reads a file, finds lines which satisfy
 some criteria, and extracts those lines, storing them in a list.


 Extracting them with pickle I hope? That's the only thing that should be
 used to unpickle a pickled file.


  The list of extracted lines looks like this:

 ATOM  1  N   GLN A   1  29.872  13.384  54.754  1.00 60.40
  N

 The output stored from the call to the pickle.dump method, however,
 looks like this:

 (lp0
 S'ATOM  1  N   GLN A   1  29.872  13.384  54.754  1.00 60.40
N  \r\n'


 Yep, I'm sure pickle can make sense of it.


  Does anyone know why the strings lp0, S', aS' are showing up?


 Because that's what pickle puts in there to help it unpickle it later.

 Why do you care? You shouldn't be looking at it (unless you want to
 understand how pickle works).

 pickle, as the name suggests, is intended for storing python objects
 for later use. This is often called object persistence in programming
 parlance. It is not designed for anything else.

 If you want cleanly formatted data in a file that you can read in a text
 editor or similar you need to do the formatting yourself or use another
 recognised format such as CSV or configparser (aka ini file).

 --
 Alan G
 Author of the Learn to Program web site
 http://www.alan-g.me.uk/


 __**_
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] pickle.dump yielding awkward output

2013-02-03 Thread Spyros Charonis
Hello Pythoners,

I am experiencing a strange result with the pickle module when using it to
write certain results to a separate file.

In short, I have a program that reads a file, finds lines which satisfy
some criteria, and extracts those lines, storing them in a list. I am
trying to write this list to a separate file.

The list of extracted lines looks like this:

ATOM  1  N   GLN A   1  29.872  13.384  54.754  1.00 60.40
  N

ATOM  2  CA  GLN A   1  29.809  11.972  54.274  1.00 58.51
  C

ATOM  3  C   GLN A   1  28.376  11.536  54.029  1.00 55.13
  C

The output stored from the call to the pickle.dump method, however, looks
like this:

(lp0
S'ATOM  1  N   GLN A   1  29.872  13.384  54.754  1.00 60.40
N  \r\n'
p1
aS'ATOM  2  CA  GLN A   1  29.809  11.972  54.274  1.00 58.51
C  \r\n'
p2
aS'ATOM  3  C   GLN A   1  28.376  11.536  54.029  1.00 55.13
C  \r\n'

The code I am using to write the output to an external file goes as follows:

def export_antibody_chains():
''' EXPORT LIST OF EXTRACTED CHAINS TO FILE '''
chains_file = open(query + '_Chains', 'wb')
pickle.dump(ab_chains, chains_file)  # ab_chains is global
chains_file.close()
return

Does anyone know why the strings lp0, S', aS' are showing up?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] indexing a list

2012-10-18 Thread Spyros Charonis
Hello pythoners,

I have a string that I want to read in fixed-length windows.

In [68]: SEQ
Out[68]:
'MKAAVLTLAVLFLTGSQARHFWQQDEPPQSPWDRVKDLATVYVDVLKDSGRDYVSQFEGSALGKQLNLKLLDNWDSVTSTFSKLREQLGPVTQEFWDNLEKETEGLRQEMSKDLEEVKAKVQPYLDDFQKKWQEEMELYRQKVEPLRAELQEGARQKLHELQEKLSPLGEEMRDRARAHVDALRTHLAPYSDELRQRLAARLEALKENGGARLAEYHAKATEHLSTLSEKAKPALEDLRQ'

I would like a function that reads the above string, 21 characters at a
time, and checks for certain conditions, i.e. whether characters co-occur
in other lists I have made. For example:

x = 21   # WINDOW LENGTH

In [70]: SEQ[0:x]
Out[70]: 'MKAAVLTLAVLFLTGSQARHF'

In [71]: SEQ[x:2*x]
Out[71]: 'WQQDEPPQSPWDRVKDLATVY'

In [72]: SEQ[2*x:3*x]
Out[72]: 'VDVLKDSGRDYVSQFEGSALG'

How could I write a function to automate this so that it does this from
SEQ[0] throughout the entire sequence, i.e. until len(SEQ)?

Many thanks for your time,
Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Parsing data from a set of files iteratively

2012-05-30 Thread Spyros Charonis
FINAL SOLUTION:

### LOOP OVER DIRECTORY
location = '/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels'
zdata = []
for filename in os.listdir(location):
filename = os.path.join(location, filename)
try:
zdata.extend(extract_zcoord(filename))
except NameError:
print No such file!
except SyntaxError:
print Check Your Syntax!
except IOError:
print PDB file NOT FOUND!
else:
continue

print 'Z-VALUES FOR ALL CHARGED RESIDUES'
print zdata #diagnostic

### WRITE Z-COORDINATE LIST TO A BINARY FILE
import pickle

f1 = open(z_coords1.dat, wb)
pickle.dump(zdata, f1)
f1.close()

f2 = open(z_coords1.dat, rb)
zdata1 = pickle.load(f2)
f2.close()

assert zdata == zdata1, error in pickle/unpickle round trip!

On Wed, May 30, 2012 at 1:09 AM, Steven D'Aprano st...@pearwood.infowrote:

 Steven D'Aprano wrote:

  location = '/Users/spyros/Desktop/**3NY8MODELSHUMAN/**HomologyModels/'
 zdata = []
 for filename in os.listdir(location):
zdata.extend(get_zcoords(**filename))


I only had the filename and not its path, that's why the system was not
able to locate the file, so
filename = os.path.join(location, filename) was used to solve that.

Many thanks to everyone for their time and efforts!

Spyros



 Hah, that can't work. listdir returns the name of the file, but not the
 file's path, which means that Python will only look in the current
 directory. You need something like this:


 location = '/Users/spyros/Desktop/**3NY8MODELSHUMAN/**HomologyModels/'
 zdata = []
 for filename in os.listdir(location):
zdata.extend(get_zcoords(os.**path.join(location, filename)))


 Sorry about that.




 --
 Steven
 __**_
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Parsing data from a set of files iteratively

2012-05-30 Thread Spyros Charonis
On Wed, May 30, 2012 at 8:16 AM, Steven D'Aprano st...@pearwood.infowrote:

 On Wed, May 30, 2012 at 07:00:30AM +0100, Spyros Charonis wrote:
  FINAL SOLUTION:

 Not quite. You are making the mistake of many newbies to treat Python
 exceptions as a problem to be covered up and hidden, instead of as a
 useful source of information.

 To quote Chris Smith:

I find it amusing when novice programmers believe their main
job is preventing programs from crashing. ... More experienced
programmers realize that correct code is great, code that
crashes could use improvement, but incorrect code that doesn't
crash is a horrible nightmare.
-- http://cdsmith.wordpress.com/2011/01/09/an-old-article-i-wrote/
 Ok, so basically wrong code beats useless code.

 There is little as painful as a program which prints An error occurred
 and then *keeps working*. What does this mean? Can I trust that the
 program's final result is correct? How can it be correct if an error
 occurred? What error occurred? How do I fix it?

My understanding is that an except clause will catch a relevant error and
raise an exception if there is one, discontinuing program execution.


 Exceptions are your friend, not your enemy. An exception tells you that
 there is a problem with your program that needs to be fixed. Don't
 cover-up exceptions unless you absolutely have to.


 Sadly, your indentation is still being broken when you post. Please
 ensure you include indentation, and disable HTML or Rich Text posting.
 I have tried to guess the correct indentation below, and fix it in
 place, but apologies if I get it wrong.

Yes, that is the way my code looks in a python interpreter



  ### LOOP OVER DIRECTORY
  location = '/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels'
  zdata = []
  for filename in os.listdir(location):
  filename = os.path.join(location, filename)
  try:
  zdata.extend(extract_zcoord(filename))
  except NameError:
  print No such file!

 Incorrect. When a file is missing, you do not get NameError. This
 except-clause merely disguises programming errors in favour of a
 misleading and incorrect error message.

 If you get a NameError, your program has a bug. Don't just hide the bug,
 fix it.


  except SyntaxError:
  print Check Your Syntax!

 This except-clause is even more useless. SyntaxErrors happen when the
 code is compiled, not run, so by the time the for-loop is entered, the
 code has already been compiled and cannot possibly raise SyntaxError.

What I meant was, check the syntax of my pathname specification, i.e. check
that I
did not make a type when writing the path of the directory I want to scan
over. I realize
syntax has a much more specific meaning in the context of programming -
code syntax!


 Even if it could, what is the point of this? Instead of a useful
 exception traceback, which tells you not only which line contains the
 error, but even highlights the point of the error with a ^ caret, you
 hide all the useful information and tease the user with a useless
 message Check Your Syntax!.

Ok, I didn't realize I was being so reckless - thanks for pointing that
out.


 Again, if your program raises a SyntaxError, it has a bug. Don't hide
 the bug, fix it.


  except IOError:
  print PDB file NOT FOUND!

 This, at least, is somewhat less useless than the others. At least it is
 a valid exception, and if your intention is to skip missing files,
 catching IOError is a reasonable way to do it.

 But you don't just get IOError for *missing* files, but also for
 *unreadable* files, perhaps because you don't have permission to read
 them, or perhaps because the file is corrupt and can't be read.

Understood, but given that I am reading and processing are standard ASCII
text files,
there is no good reason (which I can think of) that the files would be
*unreadable*
I verified that I had read/write permissions for all my files, which are
the default
access privileges anyway (for the owner).


 In any case, as usual, imagine yourself as the recipient of this
 message: PDB file NOT FOUND! -- what do you expect to do about it?
 Which file is missing or unreadable? How can you tell? Is this a
 problem? Are your results still valid without that PDB file's data?

Perhaps because I was writing the program I didn't think that this message
would
be confusing to others, but it did help in making clear that there was a
different error
(in this case, the absence of **filename = os.path.join(location,
filename)** to join
a filename to its pathway). Without the PDB file's data, there would be no
results - because
the program operates on each file of a directory successively (all files
are .pdb files) and uses
data in the file to build a list. So, since I was working on a directory
with only PDB files this error
says it hasn't found them - which points to a more basic error (the one
mentioned above).



 If this can be be ignored, IGNORE IT! Don't bother

Re: [Tutor] Parsing data from a set of files iteratively

2012-05-27 Thread Spyros Charonis
Returning to this original problem, I have modified my program from a
single long procedure to
3 functions which do the following:

serialize_pipeline_model(f): takes as input a file, reads it and parses
coordinate values
(numerical entries in the file) into a list

write_to_binary(): writes the generated list to a binary file (pickles it)

read_binary(): unpickles the aggregate of merged lists that should be one
large list.

The code goes like so:

**
z_coords1 = []

def serialize_pipeline_model(f):
  
  .
  #  z_coords1 = [] has been declared global
global z_coords1
charged_groups = lys_charged_group + arg_charged_group + his_charged_group
+ asp_charged_group + glu_charged_group
for i in range(len(charged_groups)):
z_coords1.append(float(charged_groups[i][48:54]))

#print z_coords1
return z_coords1

import pickle, shelve
print '\nPickling z-coordinates list'

def write_to_binary():
 iteratively write successively generated z_coords1 to a binary file 
f = open(z_coords1.dat, ab)
pickle.dump(z_coords1, f)
f.close()
return

def read_binary():
 read the binary list 
print '\nUnpickling z-coordinates list'
f = open(z_coords1.dat, rb)
z_coords1=pickle.load(f)
print(z_coords1)
f.close()
return

### LOOP OVER DIRECTORY
for f in
os.listdir('/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels/'):
serialize_pipeline_model(f)
write_to_binary()

read_binary()
print '\n Z-VALUES FOR ALL CHARGED RESIDUES'
print z_coords1
**

The problem is that the list (z_coords1) returns as an empty list. I know
the code works (too large to post here)
in a procedural format (z_coords1 can be generated correctly), so as a
diagnostic I included a print statement
in the serialize function to see that the list that is generated for each
of the 500 files.

Short of some intricacy with the scopes of the program I may be missing, I
am not sure why this is happening? Deos anybody have
any ideas? Many thanks for your time.

Best regards,
Spyros


On Fri, May 18, 2012 at 7:23 PM, Spyros Charonis s.charo...@gmail.comwrote:

 Dear Python community,

 I have a set of ~500 files which I would like to run a script on. My
 script extracts certain information and
 generates several lists with items I need. For one of these lists, I need
 to combine the information from all
 500 files into one super-list. Is there a way in which I can iteratively
 execute my script over all 500 files
 and get them to write the list I need into a new file? Many thanks in
 advance for your time.

 Spyros

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Parsing data from a set of files iteratively

2012-05-19 Thread Spyros Charonis
I have tried the following two snippets which both results in the same
error

import os, glob
os.chdir('users/spyros/desktop/3NY8MODELSHUMAN/')
homology_models = glob.glob('*.pdb')
for i in range(len(homology_models)):
python serialize_PIPELINE_models.py homology_models[i]

import os, sys
path = /users/spyros/desktop/3NY8MODELSHUMAN/
dirs = os.listdir(path)
for file in dirs:
python serialize_PIPELINE_models.py

The error, respectively for each snipped, read:

File stdin, line 2
python serialize_PIPELINE_models.py homology_models[i]
   ^
SyntaxError: invalid syntax

 File stdin, line 2
python serialize_PIPELINE_models.py
   ^
SyntaxError: invalid syntax

In the first snippet, the final line reads:
'python' (calling the interpreter) 'serialize_PIPELINE_models.py' (calling
my python program) 'homology_models[i]' (the file to run it on)

the glob.glob routine returns a list of files, so maybe python does not
allow the syntax python (call interpreter) list entry ?

Many thanks.
Spyros



On Fri, May 18, 2012 at 7:57 PM, Alan Gauld alan.ga...@btinternet.comwrote:

 On 18/05/12 19:23, Spyros Charonis wrote:

 Dear Python community,

 I have a set of ~500 files which I would like to run a script on.

  ...Is there a way in which I can iteratively execute my script
  over all 500 files

 Yes.
 You could use os.walk() or the glob module depending on whether
 the files are in a folder heirarchy or a single folder.

 That will give you access to each file.
 Put your functionality into a function taking a single file
 as input and a list to which you append the new data.
 Call that function for each file in turn.

 Try that and if you get stuck come back with a more specific question, the
 code you used and the full error text.

 --
 Alan G
 Author of the Learn to Program web site
 http://www.alan-g.me.uk/


 __**_
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Parsing data from a set of files iteratively

2012-05-18 Thread Spyros Charonis
Dear Python community,

I have a set of ~500 files which I would like to run a script on. My script
extracts certain information and
generates several lists with items I need. For one of these lists, I need
to combine the information from all
500 files into one super-list. Is there a way in which I can iteratively
execute my script over all 500 files
and get them to write the list I need into a new file? Many thanks in
advance for your time.

Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] List Indexing Issue

2012-05-08 Thread Spyros Charonis
Hello python community,

I'm having a small issue with list indexing. I am extracting certain
information from a PDB (protein information) file and need certain fields
of the file to be copied into a list. The entries look like this:

ATOM   1512  N   VAL A 222   8.544  -7.133  25.697  1.00 48.89
  N
ATOM   1513  CA  VAL A 222   8.251  -6.190  24.619  1.00 48.64
  C
ATOM   1514  C   VAL A 222   9.528  -5.762  23.898  1.00 48.32
  C

I am using the following syntax to parse these lines into a list:

charged_res_coord = [] # store x,y,z of extracted charged resiudes
for line in pdb:
if line.startswith('ATOM'):
atom_coord.append(line)

for i in range(len(atom_coord)):
for item in charged_res:
if item in atom_coord[i]:
charged_res_coord.append(atom_coord[i].split()[1:9])


The problem begins with entries such as the following.

ROW1)   ATOM   1572  NH2 ARG A 228   7.890 -13.328  16.363  1.00 59.63
  N

ROW2)   ATOM   1617  N   GLU A1005  11.906  -2.722   7.994  1.00 44.02
  N

Here, the code that I use to extract the third spatial coordinate (the last
of the three consecutive non-integer values) produces a problem:

because 'A1005' (second row) is considered as a single list entry, while
'A' and '228' (first row) are two list entries, when I
use a loop to index the 7th element it extracts '16.363' (entry I want) for
first row and 1.00 (not entry I want) for the second row.

 charged_res_coord[1]
['1572', 'NH2', 'ARG', 'A', '228', '7.890', '-13.328', '16.363']

 charged_res_coord[10]
['1617', 'N', 'GLU', 'A1005', '11.906', '-2.722', '7.994', '1.00']


The loop I use goes like this:

for i in range(len(lys_charged_group)):
lys_charged_group[i][7] = float(lys_charged_group[i][7])

The [7] is the problem - in lines that are like ROW1 the code extracts the
correct value,
but in lines that are like ROW2 the code extracts the wrong value.
Unfortunately, the different formats of rows are interspersed
so I don't know if I can solve this using text processing routines? Would I
have to use regular expressions?

Many thanks for your help!

Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Concatenating multiple lines into one

2012-02-12 Thread Spyros Charonis
Thanks for all the help, Peter's and Hugo's methods worked well in
concatenating multiple lines into a single data structure!

S

On Fri, Feb 10, 2012 at 5:30 PM, Mark Lawrence breamore...@yahoo.co.ukwrote:

 On 10/02/2012 17:08, Peter Otten wrote:

 Spyros Charonis wrote:

  Dear python community,

 I have a file where I store sequences that each have a header. The
 structure of the file is as such:

  sp|(some code) =1st header

 AGGCGG
 MNKPLOI
 .
 .

  sp|(some code) =  2nd header

 AA
  ...
 .

 ..

 I am looking to implement a logical structure that would allow me to
 group
 each of the sequences (spread on multiple lines) into a single string. So
 instead of having the letters spread on multiple lines I would be able to
 have 'AGGCGGMNKP' as a single string that could be indexed.

 This snipped is good for isolating the sequences (=stripping headers and
 skipping blank lines) but how could I concatenate each sequence in order
 to get one string per sequence?

  for line in align_file:

 ... if line.startswith('sp'):
 ... continue
 ... elif not line.strip():
 ... continue
 ... else:
 ... print line

 (... is just OS X terminal notation, nothing programmatic)

 Many thanks in advance.


 Instead of printing the line directly collect it in a list (without
 trailing
 \n). When you encounter a line starting withsp check if that list is
 non-empty, and if so print .join(parts), assuming the list is called
 parts, and start with a fresh list. Don't forget to print any leftover
 data
 in the list once the for loop has terminated.

 __**_
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor


 The advice from Peter is sound if the strings could grow very large but
 you can simply concatenate the parts if they are not.  For the indexing
 simply store your data in a dict.

 --
 Cheers.

 Mark Lawrence.


 __**_
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Concatenating multiple lines into one

2012-02-10 Thread Spyros Charonis
Dear python community,

I have a file where I store sequences that each have a header. The
structure of the file is as such:

sp|(some code) =1st header
AGGCGG
MNKPLOI
.
.

sp|(some code) = 2nd header
AA
 ...
.

..

I am looking to implement a logical structure that would allow me to group
each of the sequences (spread on multiple lines) into a single string. So
instead of having the letters spread on multiple lines I would be able to
have 'AGGCGGMNKP' as a single string that could be indexed.

This snipped is good for isolating the sequences (=stripping headers and
skipping blank lines) but how could I concatenate each sequence in order to
get one string per sequence?

 for line in align_file:
... if line.startswith('sp'):
... continue
... elif not line.strip():
... continue
... else:
... print line

(... is just OS X terminal notation, nothing programmatic)

Many thanks in advance.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Logical Structure of Snippet

2011-05-23 Thread Spyros Charonis
Hello List,

I'm trying to read some sequence files and modify them to a particular
format. These files are structured something like:

P1; ICA1_HUMAN
AAEVDTG. (A very long sequence of letters)
P1;ICA1_BOVIN
TRETG(A very long sequence of letters)
P1;ICA2_HUMAN
WKH.(another sequence)

I read a database file which has information that I need to modify my
sequence files.
I must extract one of the data fields from the database (done this)
and place it in the sequence file (structure shown above). The relevant
database fields go like:

tt; ICA1_HUMAN   Description
tt; ICA1_BOVIN Description
tt; ICA2_HUMAN   Description

What I would like is to extract the tt; fields (I already have code for
that) and then to read
through the sequence file and insert the TT field corresponding to the P1
header right underneath
the P1 header. Basically, I need a newline everytime P1 occurs in the
sequence file and I need to paste
its corresponding TT field in that newline (for P1; ICA1_HUMAN,that would be
 ICA1_HUMAN   Description, etc).

the pseudocode would go like this:

for line sequence file:
   if line.startswith('P1; ICA )
   make a newline
   go to list with extracted tt; fields*
   find the one with the same query (tt; ICA1 ...)*
   insert this field in the newline

The steps marked * are the ones I am not sure how to implement. What
logical structure would I need to make Python match a tt; field (I already
have
the list of entries) whenever it finds a header with the same content?

Apologies for the verbosity, but I did want to be clear as it is quite
specific.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] STRING PROC

2011-05-20 Thread Spyros Charonis
Hello List,

A quick string processing query. If I have an entry in a list such as
['NAME\n'],
is there a way to split it into two separate lines:


NAME
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Indexing a List of Strings

2011-05-17 Thread Spyros Charonis
Greetings Python List,

I have a motif sequence (a list of characters e.g. 'EAWLGHEYLHAMKGLLC')
whose index I would like to return.
The list contains 20 strings, each of which is close to 1000 characters long
making it far too cumbersome to display an example.
I would like to know if there is a way to return a pair of indices, one
index where my sequence begins (at 'E' in the above case) and
one index where my sequence ends (at 'C' in the above case). In short, if
'EAWLGHEYLHAMKGLLC' spans 17 characters is it possible
to get something like 100 117, assuming it begins at 100th position and goes
up until 117th character of my string. My loop goes as
follows:

for item in finalmotifs:
for line in my_list:
if item in line:
print line.index(item)

But this only returns a single number (e.g 119), which is the index at which
my sequence begins.

Is it possible to get a pair of indices that indicate beginning and end of
substring?

Many thanks
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] String Processing Query

2011-05-16 Thread Spyros Charonis
I have a file with the following contents:

from header1
abcdefghijkl
mnopqrs
tuvwxyz
*
from header2
poiuytrewq
lkjhgfdsa
mnbvcxz
*

My string processing code goes as follows:

file1=open('/myfolder/testfile.txt')
scan = file1.readlines()

string1 = ' '
for line in scan:
if line.startswith('from'):
continue
if line.startswith('*'):
continue
string1.join(line.rstrip('\n'))

This code produces the following output:

'abcdefghijkl'
'mnopqrs'
'tuvwxyz'
'poiuytrewq'
'lkjhgfdsa'
'mnbvcxz'

I would like to know if there is a way to get the following
output instead:

'abcdefghijklmnopqrstuvwxyz'

'poiuytrewqlkjhgfdsamnbvcxz'

I'm basically trying to concatenate the strings
in order to produce 2 separate lines
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Printing output from Python program to HTML

2011-05-10 Thread Spyros Charonis
Hello everyone,

I have a Python script that extracts some text from a database file and
annotates another file,
writing the results to a new file. Because the files I am annotating are
ASCII,
I am very restricted as to how I can annotate the text, and I would like to
instead
write the results to HTML so that I can annotate my file in more visually
effective ways,e.g. by changing text color
where appropriate.  My program extracts text from a database, reads a file
that is to be annotated, and writes those
annotations to a newly created (.htm) file
I include the following headers at the beginning of my program:

print Content-type:text/html\r\n\r\n
print 'html'
print 'body'

The part of the program that finds the entry I want and produces the
annotation is about
80 lines down and goes as follow:

file_rmode = open('/myfolder/alignfiles/query1, 'r')
file_amode = open('/myfolder/alignfiles/query2, 'a+')

file1 = motif_file.readlines() # file has been created in code not shown
file2 = file_rmode.readlines()

for line in seqalign:
   for item in finalmotifs:
   item = item.strip().upper()
   if item in line:
  newline = line.replace(item, p font color = red item
/font /p) # compiler complains here about the word red
  # sys.stdout.write(newline)
  align_file_amode.write(line)

print '/body'
print '/html'

motif_file.close()
align_file_rmode.close()
align_file_amode.close()

The Python compiler complains on the line I try to change the font color,
saying invalid syntax.  Perhaps I
need to import the cgi module to make this a full CGI program? (I have
configured my Apache server). Or alternatively, my HTML code is messed up,
but I
am pretty sure this is more or less a simple task.

I am working in Python 2.6.5. Many thanks in advance

Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Printing output from Python program to HTML

2011-05-10 Thread Spyros Charonis
Thanks, very simple but I missed that because it was supposed be in HTML
code!

On Tue, May 10, 2011 at 1:16 PM, Spyros Charonis s.charo...@gmail.comwrote:

 Hello everyone,

 I have a Python script that extracts some text from a database file and
 annotates another file,
 writing the results to a new file. Because the files I am annotating are
 ASCII,
 I am very restricted as to how I can annotate the text, and I would like to
 instead
 write the results to HTML so that I can annotate my file in more visually
 effective ways,e.g. by changing text color
 where appropriate.  My program extracts text from a database, reads a file
 that is to be annotated, and writes those
 annotations to a newly created (.htm) file
 I include the following headers at the beginning of my program:

 print Content-type:text/html\r\n\r\n
 print 'html'
 print 'body'

 The part of the program that finds the entry I want and produces the
 annotation is about
 80 lines down and goes as follow:

 file_rmode = open('/myfolder/alignfiles/query1, 'r')
 file_amode = open('/myfolder/alignfiles/query2, 'a+')

 file1 = motif_file.readlines() # file has been created in code not shown
 file2 = file_rmode.readlines()

 for line in seqalign:
for item in finalmotifs:
item = item.strip().upper()
if item in line:
   newline = line.replace(item, p font color = red item
 /font /p) # compiler complains here about the word red
   # sys.stdout.write(newline)
   align_file_amode.write(line)

 print '/body'
 print '/html'

 motif_file.close()
 align_file_rmode.close()
 align_file_amode.close()

 The Python compiler complains on the line I try to change the font color,
 saying invalid syntax.  Perhaps I
 need to import the cgi module to make this a full CGI program? (I have
 configured my Apache server). Or alternatively, my HTML code is messed up,
 but I
 am pretty sure this is more or less a simple task.

 I am working in Python 2.6.5. Many thanks in advance

 Spyros

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Problem with printing Python output to HTML Correctly

2011-05-10 Thread Spyros Charonis
Hello,

I know I posted the exact same topic a few hours ago and I do apologize for
this, but my script had a careless error, and my real issue is somewhat
different.
 I have a Python script that extracts some text from a database file and
annotates another file, writing the results to a new file. Because the files
I am annotating are ASCII,
I am very restricted as to how I can annotate the text, and I would like to
instead write the results to HTML so that I can annotate my file in more
visually effective ways,e.g. by changing text color where appropriate.  My
program extracts text from a database, reads a file that is to be annotated,
and writes those
annotations to a newly created (.htm) file.

finalmotifs = motif_file.readlines()
seqalign = align_file_rmode.readlines()

# These two files have been created in code that I don't show here because
it is not relevant to the issue

align_file_appmode.write('html')
align_file_appmode.write('head')

align_file_appmode.write
('title
\'query_\' Multiple Sequence Alignment
 /title')

align_file_appmode.write('/head')
align_file_appmode.write('body')

for line in seqalign:
align_file_appmode.write('p \'line\' /p')
for item in finalmotifs:
item = item.strip().upper()
if item in line:

newline = line.replace
(item, 'p font color = red \'item\' /font/p')

align_file_appmode.write(newline)

align_file_appmode.write('/body')
align_file_appmode.write('/html')

motif_file.close()
align_file_rmode.close()
align_file_appmode.close()

The .htm file that is created is not what I intend it to be, it has the word
item
printed every couple lines because I assume I'm not passing the string
 sequence that I want to output correctly.

QUESTION
Basically, HTML (or the way I wrote my code) does not understand that with
the
escape character '\item\' I am trying to print a string and not the word
item.
Is there someway to correct that or would I have to use
something like XML to create a markup system that specifically describes my
data?

I am aware Python supports multiline strings (using the format ''' text ''')
but I do want my HTML ( or XML?)
to be correctly rendered before I consider making this into a CGI program.
Built in python 2.6.5
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem with printing Python output to HTML Correctly

2011-05-10 Thread Spyros Charonis
Hi all,

No need to post answers, I figured out where my mistake was.

Spyros

On Tue, May 10, 2011 at 5:11 PM, Spyros Charonis s.charo...@gmail.comwrote:

 Hello,

 I know I posted the exact same topic a few hours ago and I do apologize for
 this, but my script had a careless error, and my real issue is somewhat
 different.
  I have a Python script that extracts some text from a database file and
 annotates another file, writing the results to a new file. Because the
 files I am annotating are ASCII,
 I am very restricted as to how I can annotate the text, and I would like to
 instead write the results to HTML so that I can annotate my file in more
 visually effective ways,e.g. by changing text color where appropriate.  My
 program extracts text from a database, reads a file that is to be annotated,
 and writes those
 annotations to a newly created (.htm) file.

 finalmotifs = motif_file.readlines()
 seqalign = align_file_rmode.readlines()

 # These two files have been created in code that I don't show here because
 it is not relevant to the issue

 align_file_appmode.write('html')
 align_file_appmode.write('head')

 align_file_appmode.write
 ('title
 \'query_\' Multiple Sequence Alignment
  /title')

 align_file_appmode.write('/head')
 align_file_appmode.write('body')

 for line in seqalign:
 align_file_appmode.write('p \'line\' /p')
 for item in finalmotifs:
 item = item.strip().upper()
 if item in line:

 newline = line.replace
 (item, 'p font color = red \'item\' /font/p')

 align_file_appmode.write(newline)

 align_file_appmode.write('/body')
 align_file_appmode.write('/html')

 motif_file.close()
 align_file_rmode.close()
 align_file_appmode.close()

 The .htm file that is created is not what I intend it to be, it has the
 word item
 printed every couple lines because I assume I'm not passing the string
  sequence that I want to output correctly.

 QUESTION
 Basically, HTML (or the way I wrote my code) does not understand that with
 the
 escape character '\item\' I am trying to print a string and not the word
 item.
 Is there someway to correct that or would I have to use
 something like XML to create a markup system that specifically describes my
 data?

 I am aware Python supports multiline strings (using the format ''' text
 ''') but I do want my HTML ( or XML?)
 to be correctly rendered before I consider making this into a CGI program.
 Built in python 2.6.5

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem with printing Python output to HTML Correctly

2011-05-10 Thread Spyros Charonis
A SOLUTION TO THE PROBLEM I POSTED:

align_file_rmode =
open('/Users/spyros/folder1/python/printsmotifs/alignfiles/' + query1, 'r')
align_file_appmode =
open('/Users/spyros/folder1/python/printsmotifs/alignfiles/' + query2, 'a+')

finalmotifs = motif_file.readlines()
seqalign = align_file_rmode.readlines()

for line in seqalign:
#align_file_appmode.write('p \'line\' /p')
for item in finalmotifs:
item = item.strip().upper()
annotation = span style=\color:red\+item+/span
if item in line:
newline = line.replace(item, annotation)
# sys.stdout.write(newline)
align_file_appmode.write(newline)

motif_file.close()
align_file_rmode.close()
align_file_appmode.close()

the line

annotation = span style=\color:red\+item+/span

added a span and set the color in CSS.

On Tue, May 10, 2011 at 6:14 PM, Spyros Charonis s.charo...@gmail.comwrote:

 Hi all,

 No need to post answers, I figured out where my mistake was.

 Spyros


 On Tue, May 10, 2011 at 5:11 PM, Spyros Charonis s.charo...@gmail.comwrote:

 Hello,

 I know I posted the exact same topic a few hours ago and I do apologize
 for this, but my script had a careless error, and my real issue is somewhat
 different.
  I have a Python script that extracts some text from a database file and
 annotates another file, writing the results to a new file. Because the
 files I am annotating are ASCII,
 I am very restricted as to how I can annotate the text, and I would like
 to instead write the results to HTML so that I can annotate my file in more
 visually effective ways,e.g. by changing text color where appropriate.  My
 program extracts text from a database, reads a file that is to be annotated,
 and writes those
 annotations to a newly created (.htm) file.

 finalmotifs = motif_file.readlines()
 seqalign = align_file_rmode.readlines()

 # These two files have been created in code that I don't show here because
 it is not relevant to the issue

 align_file_appmode.write('html')
 align_file_appmode.write('head')

 align_file_appmode.write
 ('title
 \'query_\' Multiple Sequence Alignment
  /title')

 align_file_appmode.write('/head')
 align_file_appmode.write('body')

 for line in seqalign:
 align_file_appmode.write('p \'line\' /p')
 for item in finalmotifs:
 item = item.strip().upper()
 if item in line:

 newline = line.replace
 (item, 'p font color = red \'item\' /font/p')

 align_file_appmode.write(newline)

 align_file_appmode.write('/body')
 align_file_appmode.write('/html')

 motif_file.close()
 align_file_rmode.close()
 align_file_appmode.close()

 The .htm file that is created is not what I intend it to be, it has the
 word item
 printed every couple lines because I assume I'm not passing the string
  sequence that I want to output correctly.

 QUESTION
 Basically, HTML (or the way I wrote my code) does not understand that with
 the
 escape character '\item\' I am trying to print a string and not the word
 item.
 Is there someway to correct that or would I have to use
 something like XML to create a markup system that specifically describes
 my data?

 I am aware Python supports multiline strings (using the format ''' text
 ''') but I do want my HTML ( or XML?)
 to be correctly rendered before I consider making this into a CGI program.
 Built in python 2.6.5



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] triple-nested for loop not working

2011-05-04 Thread Spyros Charonis
Hello everyone,

I have written a program, as part of a bioinformatics project, that extracts
motif sequences (programmatically just strings of letters) from a database
and writes them to a file.
I have written another script to annotate the database file (in plaintext
ASCII format) by replacing every match of a motif with a sequence of tildes
(~).  Primitive I know, but not much more can be done with ASCII files.  The
code goes as follows:


motif_file = open('myfolder/pythonfiles/final motifs_11SGLOBULIN', 'r')   #
= final motifs_11sglobulin contains the output of my first program
align_file = open('myfolder/pythonfiles/11sglobulin.seqs', 'a+')  #
= 11sglobulin.seqs is the ASCII sequence alignment file which I want to
annotate (modify)

finalmotif_seqs = []
finalmotif_length = []  # store length of each motif
finalmotif_annot = []

for line in finalmotifs:
finalmotif_seqs.append(line)
mot_length = len(line)
finalmotif_length.append(mot_length)

for item in finalmotif_length:
annotation = '~' * item
finalmotif_annot.append(annotation)

finalmotifs = motif_file.readlines()
seqalign = align_file.readlines()

for line in seqalign:
for i in len(finalmotif_seqs):  # for item in finalmotif_seqs:
for i in len(finalmotif_annot): # for item in finalmotif_annot:
if finalmotif_seqs[i] in line:  # if item in line:
newline = line.replace(finalmotif_seqs[i],
finalmotif_annot[i])
#sys.stdout.write(newline)   # = print the lines out on
the shell
align_file.writelines(newline)

motif_file.close()
align_file.close()


My coding issue is that although the script runs, there is a logic error
somewhere in the triple-nested for loop as I when I check my file I'm
supposedly modifying there is no change. All three lists are built correctly
(I've confirmed this on the Python shell). Any help would be much
appreciated!
I am running Python 2.6.5
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Filtering out unique list elements

2011-05-03 Thread Spyros Charonis
Dear All,

I have built a list with multiple occurrences of a string after some text
processing that goes something like this:

[cat, dog, cat, cat, cat, dog, dog, tree, tree, tree, bird, bird, woods,
woods]

I am wondering how to truncate this list so that I only print out the unique
elements, i.e. the same list but with one occurrence per element:

[cat, dog, tree, bird, woods]

Any help much appreciated!

Regards,
Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Deleting strings from a line

2011-04-26 Thread Spyros Charonis
Hello,

I've written a script that scans a biological database and extracts some
information. A sample of output from my script is as follows:

LYLGILLSHAN  AA3R_SHEEP26331

 LYMGILLSHAN  AA3R_HUMAN26431

 MCLGILLSHANAA3R_RAT26631

 LLVGILLSHAN  AA3R_RABIT26531

The leftmost strings are the ones I want to keep, while I would like to get
rid of the ones to the right (AA3R_SHEEP, 263 61) which are just indicators
of where the sequence came from and genomic coordinates. Is there any way to
do this with a string processing command? The loop which builds my list goes
like this:

 for line in query_lines:
if line.startswith('fd;'):  # find motif sequences
#print Found an FD for your query!,
line.rstrip().lstrip('fd;')
print line.lstrip('fd;')
motif.append(line.rstrip().lstrip('fd;'))

Is there a del command I can use to preserve only the actual sequences
themselves. Many thanks in advance!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Script for Parsing string sequences from a file

2011-04-15 Thread Spyros Charonis
Hello,

I'm doing a biomedical degree and am taking a course on bioinformatics. We
were given a raw version of a public database in a file (the file is in
simple ASCII) and need to extract only certain lines containing important
information. I've made a script that does not work and I am having trouble
understanding why.

when I run it on the python shell, it prompts for a protein name but then
reports that there is no such entry. The first while loop nested inside a
for loop is intended to pick up all lines beginning with gc;, chop off the
gc; part and keep only the text after that (which is a protein name).
 Then it scans the file and collects all lines, chops the gc; and stores
in them in a tuple. This tuple is not built correctly, because as I posted
when the program is run it reports that it cannot find my query in the tuple
I created and it is certainly in the database. Can you detect what the
mistake is? Thank you in advance!

Spyros


myParser.py
Description: Binary data
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor