[Tutor] Understanding a linear runtime implementation of anagram detection
Dear All, I am learning about analysis of algorithms (python 2.7.6). I am reading a book (Problem solving with Algorithms and Data Structures) where Python is the language used for implementations. The author introduces algorithm analysis in a clear and understandable way, and uses an anagram detection program as a template to compare different runtime implementations (quadratic, log linear, linear). In the linear, and most efficient implementation, the code is as follows (comments added by me): def anagram_test2(s1,s2):""" Checks if two strings are anagrams of each other Runs with O(n) linear complexity """ if (not s1) or (not s2): raise TypeError, "Invalid input: input must be string" return None # Initialize two lists of counters c1 = [0] * 26 c2 = [0] * 26 # Iterate over each string# When a char is encountered, # increment the counter at # its correspoding position for i in range(len(s1)): pos = ord(s1[i]) - ord("a") c1[pos] += 1 for i in range(len(s2)): pos = ord(s2[i]) - ord("a") c2[pos] += 1 j = 0 hit = Truewhile j < 26 and hit: if c1[j] == c2[j]: j += 1 else: hit = False return hit My questions are: 1) Is it computationally more/less/equally efficient to use an explicit while loop as it is to just do "return c1 === c2" (replacing the final code block following the two for loops). I realize that this single line of code performs an implicit for loop over each index to test for equality. My guess is that because in other languages you may not be able to do this simple test, the author wanted to present an example that could be adapted for other languages, unless the explicit while loop is less expensive computationally. 2) How could I go about adapting this algorithm for multiple strings (say I had 20 strings and wanted to check if they are anagrams of one another). def are_anagrams(*args): """ Accepts a tuple of strings and checks if they are anagrams of each other """ # Check that neither of strings are null for i in args: if not i: raise TypeError, "Invalid input" return None # Initialize a list of counters for each string c = ( [] for i in range(len(args) ) ??? Many thanks in advance! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] bubble sort function
Many thanks for the link as well as for the pseudocode code. I see what I did wrong now. Here's the final version that works: def bubbleSort_ascending(unsorted): Sorts a list of numbers in ascending order n = len(unsorted) count = swaps = 0 swapped = True ## Prompt user to choose if they want to see each sorting step option = raw_input(Show sorting steps? (Y/N):\n) while swapped: count += 1 swapped = False ## Use a tuple assignment in order to swap the value of two variables for i in range(1, n): if unsorted[i-1] unsorted[i]: unsorted[i-1], unsorted[i] = unsorted[i], unsorted[i-1] swapped = True ## Catch user input and either show or hide sorting steps accordingly if option in (Y, y): print \nIteration %d, %d swaps; list: %r\n %(count, swaps, unsorted) elif option in (N, n): pass else: print \nYour input was invalid, type either Y/y or N/n return unsorted On Sun, Nov 16, 2014 at 4:50 AM, Steven D'Aprano st...@pearwood.info wrote: On Sat, Nov 15, 2014 at 04:46:26PM +, Spyros Charonis wrote: Dear group, I'm having a bit of trouble with understanding why my bubble sort implementation doesn't work. I've got the following function to perform a bubble sort operation on a list of numbers: It doesn't work because it is completely wrong. Sorry to be harsh, but sometimes it is easier to throw broken code away and start again than it is to try to diagnose the problems with it. Let's start with the unoptimized version of bubblesort given by Wikipedia: https://en.wikipedia.org/wiki/Bubble_sort#Implementation procedure bubbleSort( A : list of sortable items ) n = length(A) repeat swapped = false for i = 1 to n-1 inclusive do /* if this pair is out of order */ if A[i-1] A[i] then /* swap them and remember something changed */ swap( A[i-1], A[i] ) swapped = true end if end for until not swapped end procedure Let's translate that to Python: def bubbleSort(alist): n = len(alist) swapped = True while swapped: swapped = False for i in range (1, n-1): # if this pair is out of order if alist[i-1] alist[i]: # swap them and remember something changed alist[i-1], alist[i] = alist[i], alist[i-1] swapped = True Let's add something to print the partially sorted list each time we go through the loop: def bubbleSort(alist): print(Unsorted: %r % alist) n = len(alist) swapped = True count = swaps = 0 while swapped: count += 1 swapped = False for i in range (1, n): # if this pair is out of order if alist[i-1] alist[i]: # swap them and remember something changed swaps += 1 alist[i-1], alist[i] = alist[i], alist[i-1] swapped = True print(Iteration %d, %d swaps; list: %r % (count, swaps, alist)) And now let's try it: py mylist = [2, 4, 6, 8, 1, 3, 5, 7, 9, 0] py bubbleSort(mylist) Unsorted: [2, 4, 6, 8, 1, 3, 5, 7, 9, 0] Iteration 1, 5 swaps; list: [2, 4, 6, 1, 3, 5, 7, 8, 0, 9] Iteration 2, 9 swaps; list: [2, 4, 1, 3, 5, 6, 7, 0, 8, 9] Iteration 3, 12 swaps; list: [2, 1, 3, 4, 5, 6, 0, 7, 8, 9] Iteration 4, 14 swaps; list: [1, 2, 3, 4, 5, 0, 6, 7, 8, 9] Iteration 5, 15 swaps; list: [1, 2, 3, 4, 0, 5, 6, 7, 8, 9] Iteration 6, 16 swaps; list: [1, 2, 3, 0, 4, 5, 6, 7, 8, 9] Iteration 7, 17 swaps; list: [1, 2, 0, 3, 4, 5, 6, 7, 8, 9] Iteration 8, 18 swaps; list: [1, 0, 2, 3, 4, 5, 6, 7, 8, 9] Iteration 9, 19 swaps; list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] Iteration 10, 19 swaps; list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] Now you can inspect the working code and compare it to the non-working code below and see what is different: def bubble_sort_ascending(unsorted): Sorts a list of numbers into ascending order iterations = 0 size = len(unsorted) - int(1) for i in range(0, size): unsorted[i] = float(unsorted[i]) while unsorted[i] unsorted[i+1]: # Use a tuple assignment in order to swap the value of two variables unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i] iterations += 1 sorted_vec = unsorted[:] # copy unsorted which is now sorted print \nIterations completed: %s\n %(iterations) return sorted_vec -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist
[Tutor] bubble sort function
Dear group, I'm having a bit of trouble with understanding why my bubble sort implementation doesn't work. I've got the following function to perform a bubble sort operation on a list of numbers: def bubble_sort_ascending(unsorted): Sorts a list of numbers into ascending order iterations = 0 size = len(unsorted) - int(1) for i in range(0, size): unsorted[i] = float(unsorted[i]) while unsorted[i] unsorted[i+1]: # Use a tuple assignment in order to swap the value of two variables unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i] iterations += 1 sorted_vec = unsorted[:] # copy unsorted which is now sorted print \nIterations completed: %s\n %(iterations) return sorted_vec Example: mylist = [4, 1, 7, 19, 13, 22, 17, 14, 23, 21] When I call it as such bubble_sort_ascending(mylist), it returns the list only partially sorted with 5 iterations reported, i.e. [1, 4.0, 7.0, 13, 19.0, 17, 14, 22.0, 21, 23.0] and I have to call it again for the the sorting operation to complete. Is there something I am missing in my code? Why does it not sort the entire list at once and just count all completed iterations? Any help appreciated. Many thanks, Spyros ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] bubble sort function
Thank you Alan, When I initiated the loop with the condition: for i in range(len(unsorted)): Python raised an IndexError saying I had gone out of bounds. Hence the change to: for i in range(0, size) Yes, I actually the loop only consists of: while unsorted[i] unsorted[i+1]: # Use a tuple assignment in order to swap the value of two variables unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i] iterations += 1 Sorry about that. the *iterations* update and sorted_vec assignment are outside of the loop body. This is indeed just a learning exercise, I am aware that lists have sort() and reverse() methods. I'm in the process of learning a bit about data structures algorithms using Python as my implementation language. On Sat, Nov 15, 2014 at 7:02 PM, Alan Gauld alan.ga...@btinternet.com wrote: On 15/11/14 16:46, Spyros Charonis wrote: def bubble_sort_ascending(unsorted): iterations = 0 size = len(unsorted) - int(1) Don't convert 1 to an int - it already is. for i in range(0, size): This will result in 'i' going from zero to len()-2. Is that what you want? unsorted[i] = float(unsorted[i]) Comparing ints to floats or even comparing two floats is notoriously error prone due to the imprecision of floating point representation. You probably don't want to do the conversion. And if you must do it, why do you only do it once, outside the while loop? while unsorted[i] unsorted[i+1]: unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i] iterations += 1 I assume you intended to end the loop body here? But the following lines are indented so are included in the loop. Also because you never change 'i' the loop can only ever run once. So really you could use a an if statement instead of the while loop? Finally, iterations is really counting swaps. Is that what you want it to count or os it actually loop iterations? If so which? The for loop or the while loop or the sum of both? sorted_vec = unsorted[:] print \nIterations completed: %s\n %(iterations) return sorted_vec Since you never alter sorted_vec there is no point in creating it. Just return unsorted - which is now sorted... and I have to call it again for the the sorting operation to complete. Is there something I am missing in my code? Why does it not sort the entire list at once and just count all completed iterations? There are several things missing or broken, the few I've pointed out above will help but the algorithm seems suspect to me. You need to revisit the core algorithm I suspect. BTW I assume this is just a learning exercise since the default sorting algorithm will virtually always be better than bubble sort for any real work! -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Arbitrary-argument set function
Dear Pythoners, I am trying to extract from a set of about 20 sequences, the characters which are unique to each sequence. For simplicity, imagine I have only 3 sequences (words in this example) such as: s1='spam'; s2='scam', s3='slam' I would like the character that is unique to each sequence, i.e. I need my function to return the list [ 'p', 'c', ',l' ]. This function I am using is as follows: def uniq(*args): FIND UNIQUE ELEMENTS OF AN ARBITRARY NUMBER OF SEQUENCES unique = [] for i in args[0]: if i not in args[1:]: unique.append(i) return unique and is returning the list [ 's', 'p', 'a', 'm' ]. Any help much appreciated, Best, Spyros ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Text Processing Query
Hello Pythoners, I am trying to extract certain fields from a file that whose text looks like this: COMPND 2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4; COMPND 3 CHAIN: A, B; COMPND 10 MOL_ID: 2; COMPND 11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN; COMPND 12 CHAIN: D, F; COMPND 13 ENGINEERED: YES; COMPND 14 MOL_ID: 3; COMPND 15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN; COMPND 16 CHAIN: E, G; I would like the chain IDs, but only those following the text heading ANTIBODY FAB FRAGMENT, i.e. I need to create a list with D,F,E,G which excludes A,B which have a non-antibody text heading. I am using the following syntax: with open(filename) as file: scanfile=file.readlines() for line in scanfile: if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue elif line[0:6]=='COMPND' and 'CHAIN' in line: print line But this yields: COMPND 3 CHAIN: A, B; COMPND 12 CHAIN: D, F; COMPND 16 CHAIN: E, G; I would like to ignore the first line since A,B correspond to non-antibody text headings, and instead want to extract only D,F E,G whose text headings are specified as antibody fragments. Many thanks, Spyros ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Text Processing Query
Yes, the elif line need to have **flag_FAB ==1** as is conidition instead of **flag_FAB=1**. So: for line in scanfile: if line[0:6]=='COMPND' and 'FAB' in line: flag_FAB = 1 elif line[0:6]=='COMPND' and 'CHAIN' in line and flag_FAB == 1: print line flag_FAB = 0 On Thu, Mar 14, 2013 at 4:33 PM, Mark Lawrence breamore...@yahoo.co.ukwrote: On 14/03/2013 11:28, taserian wrote: Top posting fixed On Thu, Mar 14, 2013 at 6:56 AM, Spyros Charonis s.charo...@gmail.com mailto:s.charo...@gmail.com wrote: Hello Pythoners, I am trying to extract certain fields from a file that whose text looks like this: COMPND 2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4; COMPND 3 CHAIN: A, B; COMPND 10 MOL_ID: 2; COMPND 11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN; COMPND 12 CHAIN: D, F; COMPND 13 ENGINEERED: YES; COMPND 14 MOL_ID: 3; COMPND 15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN; COMPND 16 CHAIN: E, G; I would like the chain IDs, but only those following the text heading ANTIBODY FAB FRAGMENT, i.e. I need to create a list with D,F,E,G which excludes A,B which have a non-antibody text heading. I am using the following syntax: with open(filename) as file: scanfile=file.readlines() for line in scanfile: if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue elif line[0:6]=='COMPND' and 'CHAIN' in line: print line But this yields: COMPND 3 CHAIN: A, B; COMPND 12 CHAIN: D, F; COMPND 16 CHAIN: E, G; I would like to ignore the first line since A,B correspond to non-antibody text headings, and instead want to extract only D,F E,G whose text headings are specified as antibody fragments. Many thanks, Spyros Since the identifier and the item that you want to keep are on different lines, you'll need to set a flag. with open(filename) as file: scanfile=file.readlines() flag = 0 for line in scanfile: if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: flag = 1 elif line[0:6]=='COMPND' and 'CHAIN' in line and flag = 1: print line flag = 0 Notice that the flag is set to 1 only on FAB FRAGMENT, and it's reset to 0 after the next CHAIN line that follows the FAB FRAGMENT line. AR Notice that this code won't run due to a syntax error. -- Cheers. Mark Lawrence __**_ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] pickle.dump yielding awkward output
Thank you Alan, Steven, I don't care about the characters from the pickle operation per se, I just want the list to be stored in its native format. What I am trying to do is basically the Unix shell equivalent of: Unix command newfile.txt I am trying to store the list that I get from my code in a separate file, in human-readable format. On Mon, Feb 4, 2013 at 1:03 AM, Alan Gauld alan.ga...@btinternet.comwrote: On 03/02/13 19:26, Spyros Charonis wrote: I am experiencing a strange result with the pickle module when using it to write certain results to a separate file. The only strangec results using pickle would be if the uinpickle failed to bring back that which was pickled. Pickle is a storage format not a display format. In short, I have a program that reads a file, finds lines which satisfy some criteria, and extracts those lines, storing them in a list. Extracting them with pickle I hope? That's the only thing that should be used to unpickle a pickled file. The list of extracted lines looks like this: ATOM 1 N GLN A 1 29.872 13.384 54.754 1.00 60.40 N The output stored from the call to the pickle.dump method, however, looks like this: (lp0 S'ATOM 1 N GLN A 1 29.872 13.384 54.754 1.00 60.40 N \r\n' Yep, I'm sure pickle can make sense of it. Does anyone know why the strings lp0, S', aS' are showing up? Because that's what pickle puts in there to help it unpickle it later. Why do you care? You shouldn't be looking at it (unless you want to understand how pickle works). pickle, as the name suggests, is intended for storing python objects for later use. This is often called object persistence in programming parlance. It is not designed for anything else. If you want cleanly formatted data in a file that you can read in a text editor or similar you need to do the formatting yourself or use another recognised format such as CSV or configparser (aka ini file). -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ __**_ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] pickle.dump yielding awkward output
Hello Pythoners, I am experiencing a strange result with the pickle module when using it to write certain results to a separate file. In short, I have a program that reads a file, finds lines which satisfy some criteria, and extracts those lines, storing them in a list. I am trying to write this list to a separate file. The list of extracted lines looks like this: ATOM 1 N GLN A 1 29.872 13.384 54.754 1.00 60.40 N ATOM 2 CA GLN A 1 29.809 11.972 54.274 1.00 58.51 C ATOM 3 C GLN A 1 28.376 11.536 54.029 1.00 55.13 C The output stored from the call to the pickle.dump method, however, looks like this: (lp0 S'ATOM 1 N GLN A 1 29.872 13.384 54.754 1.00 60.40 N \r\n' p1 aS'ATOM 2 CA GLN A 1 29.809 11.972 54.274 1.00 58.51 C \r\n' p2 aS'ATOM 3 C GLN A 1 28.376 11.536 54.029 1.00 55.13 C \r\n' The code I am using to write the output to an external file goes as follows: def export_antibody_chains(): ''' EXPORT LIST OF EXTRACTED CHAINS TO FILE ''' chains_file = open(query + '_Chains', 'wb') pickle.dump(ab_chains, chains_file) # ab_chains is global chains_file.close() return Does anyone know why the strings lp0, S', aS' are showing up? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] indexing a list
Hello pythoners, I have a string that I want to read in fixed-length windows. In [68]: SEQ Out[68]: 'MKAAVLTLAVLFLTGSQARHFWQQDEPPQSPWDRVKDLATVYVDVLKDSGRDYVSQFEGSALGKQLNLKLLDNWDSVTSTFSKLREQLGPVTQEFWDNLEKETEGLRQEMSKDLEEVKAKVQPYLDDFQKKWQEEMELYRQKVEPLRAELQEGARQKLHELQEKLSPLGEEMRDRARAHVDALRTHLAPYSDELRQRLAARLEALKENGGARLAEYHAKATEHLSTLSEKAKPALEDLRQ' I would like a function that reads the above string, 21 characters at a time, and checks for certain conditions, i.e. whether characters co-occur in other lists I have made. For example: x = 21 # WINDOW LENGTH In [70]: SEQ[0:x] Out[70]: 'MKAAVLTLAVLFLTGSQARHF' In [71]: SEQ[x:2*x] Out[71]: 'WQQDEPPQSPWDRVKDLATVY' In [72]: SEQ[2*x:3*x] Out[72]: 'VDVLKDSGRDYVSQFEGSALG' How could I write a function to automate this so that it does this from SEQ[0] throughout the entire sequence, i.e. until len(SEQ)? Many thanks for your time, Spyros ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Parsing data from a set of files iteratively
FINAL SOLUTION: ### LOOP OVER DIRECTORY location = '/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels' zdata = [] for filename in os.listdir(location): filename = os.path.join(location, filename) try: zdata.extend(extract_zcoord(filename)) except NameError: print No such file! except SyntaxError: print Check Your Syntax! except IOError: print PDB file NOT FOUND! else: continue print 'Z-VALUES FOR ALL CHARGED RESIDUES' print zdata #diagnostic ### WRITE Z-COORDINATE LIST TO A BINARY FILE import pickle f1 = open(z_coords1.dat, wb) pickle.dump(zdata, f1) f1.close() f2 = open(z_coords1.dat, rb) zdata1 = pickle.load(f2) f2.close() assert zdata == zdata1, error in pickle/unpickle round trip! On Wed, May 30, 2012 at 1:09 AM, Steven D'Aprano st...@pearwood.infowrote: Steven D'Aprano wrote: location = '/Users/spyros/Desktop/**3NY8MODELSHUMAN/**HomologyModels/' zdata = [] for filename in os.listdir(location): zdata.extend(get_zcoords(**filename)) I only had the filename and not its path, that's why the system was not able to locate the file, so filename = os.path.join(location, filename) was used to solve that. Many thanks to everyone for their time and efforts! Spyros Hah, that can't work. listdir returns the name of the file, but not the file's path, which means that Python will only look in the current directory. You need something like this: location = '/Users/spyros/Desktop/**3NY8MODELSHUMAN/**HomologyModels/' zdata = [] for filename in os.listdir(location): zdata.extend(get_zcoords(os.**path.join(location, filename))) Sorry about that. -- Steven __**_ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Parsing data from a set of files iteratively
On Wed, May 30, 2012 at 8:16 AM, Steven D'Aprano st...@pearwood.infowrote: On Wed, May 30, 2012 at 07:00:30AM +0100, Spyros Charonis wrote: FINAL SOLUTION: Not quite. You are making the mistake of many newbies to treat Python exceptions as a problem to be covered up and hidden, instead of as a useful source of information. To quote Chris Smith: I find it amusing when novice programmers believe their main job is preventing programs from crashing. ... More experienced programmers realize that correct code is great, code that crashes could use improvement, but incorrect code that doesn't crash is a horrible nightmare. -- http://cdsmith.wordpress.com/2011/01/09/an-old-article-i-wrote/ Ok, so basically wrong code beats useless code. There is little as painful as a program which prints An error occurred and then *keeps working*. What does this mean? Can I trust that the program's final result is correct? How can it be correct if an error occurred? What error occurred? How do I fix it? My understanding is that an except clause will catch a relevant error and raise an exception if there is one, discontinuing program execution. Exceptions are your friend, not your enemy. An exception tells you that there is a problem with your program that needs to be fixed. Don't cover-up exceptions unless you absolutely have to. Sadly, your indentation is still being broken when you post. Please ensure you include indentation, and disable HTML or Rich Text posting. I have tried to guess the correct indentation below, and fix it in place, but apologies if I get it wrong. Yes, that is the way my code looks in a python interpreter ### LOOP OVER DIRECTORY location = '/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels' zdata = [] for filename in os.listdir(location): filename = os.path.join(location, filename) try: zdata.extend(extract_zcoord(filename)) except NameError: print No such file! Incorrect. When a file is missing, you do not get NameError. This except-clause merely disguises programming errors in favour of a misleading and incorrect error message. If you get a NameError, your program has a bug. Don't just hide the bug, fix it. except SyntaxError: print Check Your Syntax! This except-clause is even more useless. SyntaxErrors happen when the code is compiled, not run, so by the time the for-loop is entered, the code has already been compiled and cannot possibly raise SyntaxError. What I meant was, check the syntax of my pathname specification, i.e. check that I did not make a type when writing the path of the directory I want to scan over. I realize syntax has a much more specific meaning in the context of programming - code syntax! Even if it could, what is the point of this? Instead of a useful exception traceback, which tells you not only which line contains the error, but even highlights the point of the error with a ^ caret, you hide all the useful information and tease the user with a useless message Check Your Syntax!. Ok, I didn't realize I was being so reckless - thanks for pointing that out. Again, if your program raises a SyntaxError, it has a bug. Don't hide the bug, fix it. except IOError: print PDB file NOT FOUND! This, at least, is somewhat less useless than the others. At least it is a valid exception, and if your intention is to skip missing files, catching IOError is a reasonable way to do it. But you don't just get IOError for *missing* files, but also for *unreadable* files, perhaps because you don't have permission to read them, or perhaps because the file is corrupt and can't be read. Understood, but given that I am reading and processing are standard ASCII text files, there is no good reason (which I can think of) that the files would be *unreadable* I verified that I had read/write permissions for all my files, which are the default access privileges anyway (for the owner). In any case, as usual, imagine yourself as the recipient of this message: PDB file NOT FOUND! -- what do you expect to do about it? Which file is missing or unreadable? How can you tell? Is this a problem? Are your results still valid without that PDB file's data? Perhaps because I was writing the program I didn't think that this message would be confusing to others, but it did help in making clear that there was a different error (in this case, the absence of **filename = os.path.join(location, filename)** to join a filename to its pathway). Without the PDB file's data, there would be no results - because the program operates on each file of a directory successively (all files are .pdb files) and uses data in the file to build a list. So, since I was working on a directory with only PDB files this error says it hasn't found them - which points to a more basic error (the one mentioned above). If this can be be ignored, IGNORE IT! Don't bother
Re: [Tutor] Parsing data from a set of files iteratively
Returning to this original problem, I have modified my program from a single long procedure to 3 functions which do the following: serialize_pipeline_model(f): takes as input a file, reads it and parses coordinate values (numerical entries in the file) into a list write_to_binary(): writes the generated list to a binary file (pickles it) read_binary(): unpickles the aggregate of merged lists that should be one large list. The code goes like so: ** z_coords1 = [] def serialize_pipeline_model(f): . # z_coords1 = [] has been declared global global z_coords1 charged_groups = lys_charged_group + arg_charged_group + his_charged_group + asp_charged_group + glu_charged_group for i in range(len(charged_groups)): z_coords1.append(float(charged_groups[i][48:54])) #print z_coords1 return z_coords1 import pickle, shelve print '\nPickling z-coordinates list' def write_to_binary(): iteratively write successively generated z_coords1 to a binary file f = open(z_coords1.dat, ab) pickle.dump(z_coords1, f) f.close() return def read_binary(): read the binary list print '\nUnpickling z-coordinates list' f = open(z_coords1.dat, rb) z_coords1=pickle.load(f) print(z_coords1) f.close() return ### LOOP OVER DIRECTORY for f in os.listdir('/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels/'): serialize_pipeline_model(f) write_to_binary() read_binary() print '\n Z-VALUES FOR ALL CHARGED RESIDUES' print z_coords1 ** The problem is that the list (z_coords1) returns as an empty list. I know the code works (too large to post here) in a procedural format (z_coords1 can be generated correctly), so as a diagnostic I included a print statement in the serialize function to see that the list that is generated for each of the 500 files. Short of some intricacy with the scopes of the program I may be missing, I am not sure why this is happening? Deos anybody have any ideas? Many thanks for your time. Best regards, Spyros On Fri, May 18, 2012 at 7:23 PM, Spyros Charonis s.charo...@gmail.comwrote: Dear Python community, I have a set of ~500 files which I would like to run a script on. My script extracts certain information and generates several lists with items I need. For one of these lists, I need to combine the information from all 500 files into one super-list. Is there a way in which I can iteratively execute my script over all 500 files and get them to write the list I need into a new file? Many thanks in advance for your time. Spyros ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Parsing data from a set of files iteratively
I have tried the following two snippets which both results in the same error import os, glob os.chdir('users/spyros/desktop/3NY8MODELSHUMAN/') homology_models = glob.glob('*.pdb') for i in range(len(homology_models)): python serialize_PIPELINE_models.py homology_models[i] import os, sys path = /users/spyros/desktop/3NY8MODELSHUMAN/ dirs = os.listdir(path) for file in dirs: python serialize_PIPELINE_models.py The error, respectively for each snipped, read: File stdin, line 2 python serialize_PIPELINE_models.py homology_models[i] ^ SyntaxError: invalid syntax File stdin, line 2 python serialize_PIPELINE_models.py ^ SyntaxError: invalid syntax In the first snippet, the final line reads: 'python' (calling the interpreter) 'serialize_PIPELINE_models.py' (calling my python program) 'homology_models[i]' (the file to run it on) the glob.glob routine returns a list of files, so maybe python does not allow the syntax python (call interpreter) list entry ? Many thanks. Spyros On Fri, May 18, 2012 at 7:57 PM, Alan Gauld alan.ga...@btinternet.comwrote: On 18/05/12 19:23, Spyros Charonis wrote: Dear Python community, I have a set of ~500 files which I would like to run a script on. ...Is there a way in which I can iteratively execute my script over all 500 files Yes. You could use os.walk() or the glob module depending on whether the files are in a folder heirarchy or a single folder. That will give you access to each file. Put your functionality into a function taking a single file as input and a list to which you append the new data. Call that function for each file in turn. Try that and if you get stuck come back with a more specific question, the code you used and the full error text. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ __**_ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Parsing data from a set of files iteratively
Dear Python community, I have a set of ~500 files which I would like to run a script on. My script extracts certain information and generates several lists with items I need. For one of these lists, I need to combine the information from all 500 files into one super-list. Is there a way in which I can iteratively execute my script over all 500 files and get them to write the list I need into a new file? Many thanks in advance for your time. Spyros ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] List Indexing Issue
Hello python community, I'm having a small issue with list indexing. I am extracting certain information from a PDB (protein information) file and need certain fields of the file to be copied into a list. The entries look like this: ATOM 1512 N VAL A 222 8.544 -7.133 25.697 1.00 48.89 N ATOM 1513 CA VAL A 222 8.251 -6.190 24.619 1.00 48.64 C ATOM 1514 C VAL A 222 9.528 -5.762 23.898 1.00 48.32 C I am using the following syntax to parse these lines into a list: charged_res_coord = [] # store x,y,z of extracted charged resiudes for line in pdb: if line.startswith('ATOM'): atom_coord.append(line) for i in range(len(atom_coord)): for item in charged_res: if item in atom_coord[i]: charged_res_coord.append(atom_coord[i].split()[1:9]) The problem begins with entries such as the following. ROW1) ATOM 1572 NH2 ARG A 228 7.890 -13.328 16.363 1.00 59.63 N ROW2) ATOM 1617 N GLU A1005 11.906 -2.722 7.994 1.00 44.02 N Here, the code that I use to extract the third spatial coordinate (the last of the three consecutive non-integer values) produces a problem: because 'A1005' (second row) is considered as a single list entry, while 'A' and '228' (first row) are two list entries, when I use a loop to index the 7th element it extracts '16.363' (entry I want) for first row and 1.00 (not entry I want) for the second row. charged_res_coord[1] ['1572', 'NH2', 'ARG', 'A', '228', '7.890', '-13.328', '16.363'] charged_res_coord[10] ['1617', 'N', 'GLU', 'A1005', '11.906', '-2.722', '7.994', '1.00'] The loop I use goes like this: for i in range(len(lys_charged_group)): lys_charged_group[i][7] = float(lys_charged_group[i][7]) The [7] is the problem - in lines that are like ROW1 the code extracts the correct value, but in lines that are like ROW2 the code extracts the wrong value. Unfortunately, the different formats of rows are interspersed so I don't know if I can solve this using text processing routines? Would I have to use regular expressions? Many thanks for your help! Spyros ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Concatenating multiple lines into one
Thanks for all the help, Peter's and Hugo's methods worked well in concatenating multiple lines into a single data structure! S On Fri, Feb 10, 2012 at 5:30 PM, Mark Lawrence breamore...@yahoo.co.ukwrote: On 10/02/2012 17:08, Peter Otten wrote: Spyros Charonis wrote: Dear python community, I have a file where I store sequences that each have a header. The structure of the file is as such: sp|(some code) =1st header AGGCGG MNKPLOI . . sp|(some code) = 2nd header AA ... . .. I am looking to implement a logical structure that would allow me to group each of the sequences (spread on multiple lines) into a single string. So instead of having the letters spread on multiple lines I would be able to have 'AGGCGGMNKP' as a single string that could be indexed. This snipped is good for isolating the sequences (=stripping headers and skipping blank lines) but how could I concatenate each sequence in order to get one string per sequence? for line in align_file: ... if line.startswith('sp'): ... continue ... elif not line.strip(): ... continue ... else: ... print line (... is just OS X terminal notation, nothing programmatic) Many thanks in advance. Instead of printing the line directly collect it in a list (without trailing \n). When you encounter a line starting withsp check if that list is non-empty, and if so print .join(parts), assuming the list is called parts, and start with a fresh list. Don't forget to print any leftover data in the list once the for loop has terminated. __**_ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor The advice from Peter is sound if the strings could grow very large but you can simply concatenate the parts if they are not. For the indexing simply store your data in a dict. -- Cheers. Mark Lawrence. __**_ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/**mailman/listinfo/tutorhttp://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Concatenating multiple lines into one
Dear python community, I have a file where I store sequences that each have a header. The structure of the file is as such: sp|(some code) =1st header AGGCGG MNKPLOI . . sp|(some code) = 2nd header AA ... . .. I am looking to implement a logical structure that would allow me to group each of the sequences (spread on multiple lines) into a single string. So instead of having the letters spread on multiple lines I would be able to have 'AGGCGGMNKP' as a single string that could be indexed. This snipped is good for isolating the sequences (=stripping headers and skipping blank lines) but how could I concatenate each sequence in order to get one string per sequence? for line in align_file: ... if line.startswith('sp'): ... continue ... elif not line.strip(): ... continue ... else: ... print line (... is just OS X terminal notation, nothing programmatic) Many thanks in advance. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Logical Structure of Snippet
Hello List, I'm trying to read some sequence files and modify them to a particular format. These files are structured something like: P1; ICA1_HUMAN AAEVDTG. (A very long sequence of letters) P1;ICA1_BOVIN TRETG(A very long sequence of letters) P1;ICA2_HUMAN WKH.(another sequence) I read a database file which has information that I need to modify my sequence files. I must extract one of the data fields from the database (done this) and place it in the sequence file (structure shown above). The relevant database fields go like: tt; ICA1_HUMAN Description tt; ICA1_BOVIN Description tt; ICA2_HUMAN Description What I would like is to extract the tt; fields (I already have code for that) and then to read through the sequence file and insert the TT field corresponding to the P1 header right underneath the P1 header. Basically, I need a newline everytime P1 occurs in the sequence file and I need to paste its corresponding TT field in that newline (for P1; ICA1_HUMAN,that would be ICA1_HUMAN Description, etc). the pseudocode would go like this: for line sequence file: if line.startswith('P1; ICA ) make a newline go to list with extracted tt; fields* find the one with the same query (tt; ICA1 ...)* insert this field in the newline The steps marked * are the ones I am not sure how to implement. What logical structure would I need to make Python match a tt; field (I already have the list of entries) whenever it finds a header with the same content? Apologies for the verbosity, but I did want to be clear as it is quite specific. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] STRING PROC
Hello List, A quick string processing query. If I have an entry in a list such as ['NAME\n'], is there a way to split it into two separate lines: NAME ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Indexing a List of Strings
Greetings Python List, I have a motif sequence (a list of characters e.g. 'EAWLGHEYLHAMKGLLC') whose index I would like to return. The list contains 20 strings, each of which is close to 1000 characters long making it far too cumbersome to display an example. I would like to know if there is a way to return a pair of indices, one index where my sequence begins (at 'E' in the above case) and one index where my sequence ends (at 'C' in the above case). In short, if 'EAWLGHEYLHAMKGLLC' spans 17 characters is it possible to get something like 100 117, assuming it begins at 100th position and goes up until 117th character of my string. My loop goes as follows: for item in finalmotifs: for line in my_list: if item in line: print line.index(item) But this only returns a single number (e.g 119), which is the index at which my sequence begins. Is it possible to get a pair of indices that indicate beginning and end of substring? Many thanks ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] String Processing Query
I have a file with the following contents: from header1 abcdefghijkl mnopqrs tuvwxyz * from header2 poiuytrewq lkjhgfdsa mnbvcxz * My string processing code goes as follows: file1=open('/myfolder/testfile.txt') scan = file1.readlines() string1 = ' ' for line in scan: if line.startswith('from'): continue if line.startswith('*'): continue string1.join(line.rstrip('\n')) This code produces the following output: 'abcdefghijkl' 'mnopqrs' 'tuvwxyz' 'poiuytrewq' 'lkjhgfdsa' 'mnbvcxz' I would like to know if there is a way to get the following output instead: 'abcdefghijklmnopqrstuvwxyz' 'poiuytrewqlkjhgfdsamnbvcxz' I'm basically trying to concatenate the strings in order to produce 2 separate lines ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Printing output from Python program to HTML
Hello everyone, I have a Python script that extracts some text from a database file and annotates another file, writing the results to a new file. Because the files I am annotating are ASCII, I am very restricted as to how I can annotate the text, and I would like to instead write the results to HTML so that I can annotate my file in more visually effective ways,e.g. by changing text color where appropriate. My program extracts text from a database, reads a file that is to be annotated, and writes those annotations to a newly created (.htm) file I include the following headers at the beginning of my program: print Content-type:text/html\r\n\r\n print 'html' print 'body' The part of the program that finds the entry I want and produces the annotation is about 80 lines down and goes as follow: file_rmode = open('/myfolder/alignfiles/query1, 'r') file_amode = open('/myfolder/alignfiles/query2, 'a+') file1 = motif_file.readlines() # file has been created in code not shown file2 = file_rmode.readlines() for line in seqalign: for item in finalmotifs: item = item.strip().upper() if item in line: newline = line.replace(item, p font color = red item /font /p) # compiler complains here about the word red # sys.stdout.write(newline) align_file_amode.write(line) print '/body' print '/html' motif_file.close() align_file_rmode.close() align_file_amode.close() The Python compiler complains on the line I try to change the font color, saying invalid syntax. Perhaps I need to import the cgi module to make this a full CGI program? (I have configured my Apache server). Or alternatively, my HTML code is messed up, but I am pretty sure this is more or less a simple task. I am working in Python 2.6.5. Many thanks in advance Spyros ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Printing output from Python program to HTML
Thanks, very simple but I missed that because it was supposed be in HTML code! On Tue, May 10, 2011 at 1:16 PM, Spyros Charonis s.charo...@gmail.comwrote: Hello everyone, I have a Python script that extracts some text from a database file and annotates another file, writing the results to a new file. Because the files I am annotating are ASCII, I am very restricted as to how I can annotate the text, and I would like to instead write the results to HTML so that I can annotate my file in more visually effective ways,e.g. by changing text color where appropriate. My program extracts text from a database, reads a file that is to be annotated, and writes those annotations to a newly created (.htm) file I include the following headers at the beginning of my program: print Content-type:text/html\r\n\r\n print 'html' print 'body' The part of the program that finds the entry I want and produces the annotation is about 80 lines down and goes as follow: file_rmode = open('/myfolder/alignfiles/query1, 'r') file_amode = open('/myfolder/alignfiles/query2, 'a+') file1 = motif_file.readlines() # file has been created in code not shown file2 = file_rmode.readlines() for line in seqalign: for item in finalmotifs: item = item.strip().upper() if item in line: newline = line.replace(item, p font color = red item /font /p) # compiler complains here about the word red # sys.stdout.write(newline) align_file_amode.write(line) print '/body' print '/html' motif_file.close() align_file_rmode.close() align_file_amode.close() The Python compiler complains on the line I try to change the font color, saying invalid syntax. Perhaps I need to import the cgi module to make this a full CGI program? (I have configured my Apache server). Or alternatively, my HTML code is messed up, but I am pretty sure this is more or less a simple task. I am working in Python 2.6.5. Many thanks in advance Spyros ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Problem with printing Python output to HTML Correctly
Hello, I know I posted the exact same topic a few hours ago and I do apologize for this, but my script had a careless error, and my real issue is somewhat different. I have a Python script that extracts some text from a database file and annotates another file, writing the results to a new file. Because the files I am annotating are ASCII, I am very restricted as to how I can annotate the text, and I would like to instead write the results to HTML so that I can annotate my file in more visually effective ways,e.g. by changing text color where appropriate. My program extracts text from a database, reads a file that is to be annotated, and writes those annotations to a newly created (.htm) file. finalmotifs = motif_file.readlines() seqalign = align_file_rmode.readlines() # These two files have been created in code that I don't show here because it is not relevant to the issue align_file_appmode.write('html') align_file_appmode.write('head') align_file_appmode.write ('title \'query_\' Multiple Sequence Alignment /title') align_file_appmode.write('/head') align_file_appmode.write('body') for line in seqalign: align_file_appmode.write('p \'line\' /p') for item in finalmotifs: item = item.strip().upper() if item in line: newline = line.replace (item, 'p font color = red \'item\' /font/p') align_file_appmode.write(newline) align_file_appmode.write('/body') align_file_appmode.write('/html') motif_file.close() align_file_rmode.close() align_file_appmode.close() The .htm file that is created is not what I intend it to be, it has the word item printed every couple lines because I assume I'm not passing the string sequence that I want to output correctly. QUESTION Basically, HTML (or the way I wrote my code) does not understand that with the escape character '\item\' I am trying to print a string and not the word item. Is there someway to correct that or would I have to use something like XML to create a markup system that specifically describes my data? I am aware Python supports multiline strings (using the format ''' text ''') but I do want my HTML ( or XML?) to be correctly rendered before I consider making this into a CGI program. Built in python 2.6.5 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Problem with printing Python output to HTML Correctly
Hi all, No need to post answers, I figured out where my mistake was. Spyros On Tue, May 10, 2011 at 5:11 PM, Spyros Charonis s.charo...@gmail.comwrote: Hello, I know I posted the exact same topic a few hours ago and I do apologize for this, but my script had a careless error, and my real issue is somewhat different. I have a Python script that extracts some text from a database file and annotates another file, writing the results to a new file. Because the files I am annotating are ASCII, I am very restricted as to how I can annotate the text, and I would like to instead write the results to HTML so that I can annotate my file in more visually effective ways,e.g. by changing text color where appropriate. My program extracts text from a database, reads a file that is to be annotated, and writes those annotations to a newly created (.htm) file. finalmotifs = motif_file.readlines() seqalign = align_file_rmode.readlines() # These two files have been created in code that I don't show here because it is not relevant to the issue align_file_appmode.write('html') align_file_appmode.write('head') align_file_appmode.write ('title \'query_\' Multiple Sequence Alignment /title') align_file_appmode.write('/head') align_file_appmode.write('body') for line in seqalign: align_file_appmode.write('p \'line\' /p') for item in finalmotifs: item = item.strip().upper() if item in line: newline = line.replace (item, 'p font color = red \'item\' /font/p') align_file_appmode.write(newline) align_file_appmode.write('/body') align_file_appmode.write('/html') motif_file.close() align_file_rmode.close() align_file_appmode.close() The .htm file that is created is not what I intend it to be, it has the word item printed every couple lines because I assume I'm not passing the string sequence that I want to output correctly. QUESTION Basically, HTML (or the way I wrote my code) does not understand that with the escape character '\item\' I am trying to print a string and not the word item. Is there someway to correct that or would I have to use something like XML to create a markup system that specifically describes my data? I am aware Python supports multiline strings (using the format ''' text ''') but I do want my HTML ( or XML?) to be correctly rendered before I consider making this into a CGI program. Built in python 2.6.5 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Problem with printing Python output to HTML Correctly
A SOLUTION TO THE PROBLEM I POSTED: align_file_rmode = open('/Users/spyros/folder1/python/printsmotifs/alignfiles/' + query1, 'r') align_file_appmode = open('/Users/spyros/folder1/python/printsmotifs/alignfiles/' + query2, 'a+') finalmotifs = motif_file.readlines() seqalign = align_file_rmode.readlines() for line in seqalign: #align_file_appmode.write('p \'line\' /p') for item in finalmotifs: item = item.strip().upper() annotation = span style=\color:red\+item+/span if item in line: newline = line.replace(item, annotation) # sys.stdout.write(newline) align_file_appmode.write(newline) motif_file.close() align_file_rmode.close() align_file_appmode.close() the line annotation = span style=\color:red\+item+/span added a span and set the color in CSS. On Tue, May 10, 2011 at 6:14 PM, Spyros Charonis s.charo...@gmail.comwrote: Hi all, No need to post answers, I figured out where my mistake was. Spyros On Tue, May 10, 2011 at 5:11 PM, Spyros Charonis s.charo...@gmail.comwrote: Hello, I know I posted the exact same topic a few hours ago and I do apologize for this, but my script had a careless error, and my real issue is somewhat different. I have a Python script that extracts some text from a database file and annotates another file, writing the results to a new file. Because the files I am annotating are ASCII, I am very restricted as to how I can annotate the text, and I would like to instead write the results to HTML so that I can annotate my file in more visually effective ways,e.g. by changing text color where appropriate. My program extracts text from a database, reads a file that is to be annotated, and writes those annotations to a newly created (.htm) file. finalmotifs = motif_file.readlines() seqalign = align_file_rmode.readlines() # These two files have been created in code that I don't show here because it is not relevant to the issue align_file_appmode.write('html') align_file_appmode.write('head') align_file_appmode.write ('title \'query_\' Multiple Sequence Alignment /title') align_file_appmode.write('/head') align_file_appmode.write('body') for line in seqalign: align_file_appmode.write('p \'line\' /p') for item in finalmotifs: item = item.strip().upper() if item in line: newline = line.replace (item, 'p font color = red \'item\' /font/p') align_file_appmode.write(newline) align_file_appmode.write('/body') align_file_appmode.write('/html') motif_file.close() align_file_rmode.close() align_file_appmode.close() The .htm file that is created is not what I intend it to be, it has the word item printed every couple lines because I assume I'm not passing the string sequence that I want to output correctly. QUESTION Basically, HTML (or the way I wrote my code) does not understand that with the escape character '\item\' I am trying to print a string and not the word item. Is there someway to correct that or would I have to use something like XML to create a markup system that specifically describes my data? I am aware Python supports multiline strings (using the format ''' text ''') but I do want my HTML ( or XML?) to be correctly rendered before I consider making this into a CGI program. Built in python 2.6.5 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] triple-nested for loop not working
Hello everyone, I have written a program, as part of a bioinformatics project, that extracts motif sequences (programmatically just strings of letters) from a database and writes them to a file. I have written another script to annotate the database file (in plaintext ASCII format) by replacing every match of a motif with a sequence of tildes (~). Primitive I know, but not much more can be done with ASCII files. The code goes as follows: motif_file = open('myfolder/pythonfiles/final motifs_11SGLOBULIN', 'r') # = final motifs_11sglobulin contains the output of my first program align_file = open('myfolder/pythonfiles/11sglobulin.seqs', 'a+') # = 11sglobulin.seqs is the ASCII sequence alignment file which I want to annotate (modify) finalmotif_seqs = [] finalmotif_length = [] # store length of each motif finalmotif_annot = [] for line in finalmotifs: finalmotif_seqs.append(line) mot_length = len(line) finalmotif_length.append(mot_length) for item in finalmotif_length: annotation = '~' * item finalmotif_annot.append(annotation) finalmotifs = motif_file.readlines() seqalign = align_file.readlines() for line in seqalign: for i in len(finalmotif_seqs): # for item in finalmotif_seqs: for i in len(finalmotif_annot): # for item in finalmotif_annot: if finalmotif_seqs[i] in line: # if item in line: newline = line.replace(finalmotif_seqs[i], finalmotif_annot[i]) #sys.stdout.write(newline) # = print the lines out on the shell align_file.writelines(newline) motif_file.close() align_file.close() My coding issue is that although the script runs, there is a logic error somewhere in the triple-nested for loop as I when I check my file I'm supposedly modifying there is no change. All three lists are built correctly (I've confirmed this on the Python shell). Any help would be much appreciated! I am running Python 2.6.5 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Filtering out unique list elements
Dear All, I have built a list with multiple occurrences of a string after some text processing that goes something like this: [cat, dog, cat, cat, cat, dog, dog, tree, tree, tree, bird, bird, woods, woods] I am wondering how to truncate this list so that I only print out the unique elements, i.e. the same list but with one occurrence per element: [cat, dog, tree, bird, woods] Any help much appreciated! Regards, Spyros ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Deleting strings from a line
Hello, I've written a script that scans a biological database and extracts some information. A sample of output from my script is as follows: LYLGILLSHAN AA3R_SHEEP26331 LYMGILLSHAN AA3R_HUMAN26431 MCLGILLSHANAA3R_RAT26631 LLVGILLSHAN AA3R_RABIT26531 The leftmost strings are the ones I want to keep, while I would like to get rid of the ones to the right (AA3R_SHEEP, 263 61) which are just indicators of where the sequence came from and genomic coordinates. Is there any way to do this with a string processing command? The loop which builds my list goes like this: for line in query_lines: if line.startswith('fd;'): # find motif sequences #print Found an FD for your query!, line.rstrip().lstrip('fd;') print line.lstrip('fd;') motif.append(line.rstrip().lstrip('fd;')) Is there a del command I can use to preserve only the actual sequences themselves. Many thanks in advance! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Script for Parsing string sequences from a file
Hello, I'm doing a biomedical degree and am taking a course on bioinformatics. We were given a raw version of a public database in a file (the file is in simple ASCII) and need to extract only certain lines containing important information. I've made a script that does not work and I am having trouble understanding why. when I run it on the python shell, it prompts for a protein name but then reports that there is no such entry. The first while loop nested inside a for loop is intended to pick up all lines beginning with gc;, chop off the gc; part and keep only the text after that (which is a protein name). Then it scans the file and collects all lines, chops the gc; and stores in them in a tuple. This tuple is not built correctly, because as I posted when the program is run it reports that it cannot find my query in the tuple I created and it is certainly in the database. Can you detect what the mistake is? Thank you in advance! Spyros myParser.py Description: Binary data ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor