Re: [Tutor] Problem When Iterating Over Large Test Files
On Wed, Jul 18, 2012 at 04:33:20PM -0700, Ryan Waples wrote: I've included 20 consecutive lines of input and output. Each of these 5 'records' should have been selected and printed to the output file. I count only 19 lines. The first group has only three lines. See below. There is a blank line, which I take as NOT part of the input but just a spacer. Then: 1) Line starting with @ 2) Line of bases CGCGT ... 3) Plus sign 4) Line starting with @@@ 5) Line starting with @ 6) Line of bases TTCTA ... 7) Plus sign and so on. There are TWO lines before the first +, and three before each of the others. __EXAMPLE RAW DATA FILE REGION__ @HWI-ST0747:167:B02DEACXX:8:1101:3182:167088 1:N:0: CGCGTGTGCAGGTTTATAGAACCAGCTGCAGATTAGTAGCAGCGCACGGAGAGGTGTGTCTGTTTATTGTCCTCAGCAGGCAGACATGTTTGTGGTC + @@@DDADDHB9+2A??:?G9+C)???G@DB@@DGFB0*?FF?0F:@/54'-;;?B;;6(5@CDAC(5(5:5,(8?88?BC@# @HWI-ST0747:167:B02DEACXX:8:1101:3134:167090 1:N:0: TTCTAGTGCAGGGCGACAGCGTTGCGGAGCCGGTCCGAGTCTGCTGGGTCAGTCATGGCTAGTTGGTACTATAACGACACAGGGCGAGACCCAGATGCAAA + @CCFFFDFHJJIJHHIIIJHGHIJI@GFFDDDFDDCEEEDCCBDCCCCCB@C(4@ADCA?BBBDDABB055-?AB1:@ACC: @HWI-ST0747:167:B02DEACXX:8:1101:3002:167092 1:N:0: CTTTGCTGCAGGCTCATCCTGACATGACCCTCCAGCATGACAATGCCACCAGCCATACTGCTCGTTCTGTGTGTGATTTCCAGCAAGTAAATATGTA + CCCFHIJIEHIH@AHFAGHIGIIGGEIJGIJIIIGIIIGEHGEHIIJIEHH@FHGH@=ACEHHFBFFCE@AACCACDB;;B?C3AADBA @HWI-ST0747:167:B02DEACXX:8:1101:3022:167094 1:N:0: ATTCCGTGCAGGCCAACTCCCGACGGACATCCTTGCTCAGACTGCAGCGATAGTGGTCGATCAGGGCCCTGTTGTTCCATCCCACTCCGGCGACCAGGTTC + CCCFHIDHJIIHIIIJIJIIGGIIFHJIIIIEIFHFFCBAECBDDDC:??B=AAACD?8@:C@?8CBDDD@D99B@3884A @HWI-ST0747:167:B02DEACXX:8:1101:3095:167100 1:N:0: CGTGATTGCAGGGACGTTACAGAGACGTTACAGGGATGTTACAGGGACGTTACAGAGACGTTAAAGAGATGTTACAGGGATGTTACAGACAGAGACGTTAC + Your code says that the first line in each group should start with an @ sign. That is clearly not the case for the last two groups. I suggest that your data files have been corrupted. __PYTHON CODE __ I have re-written your code slightly, to be a little closer to best practice, or at least modern practice. If there is anything you don't understand, please feel free to ask. I haven't tested this code, but it should run fine on Python 2.7. It will be interesting to see if you get different results with this. import glob def four_lines(file_object): Yield lines from file_object grouped into batches of four. If the file has fewer than four lines remaining, pad the batch with 1-3 empty strings. Lines are stripped of leading and trailing whitespace. while True: # Get the first line. If there is no first line, we are at EOF # and we raise StopIteration to indicate we are done. line1 = next(file_object).strip() # Get the next three lines, padding if needed. line2 = next(file_object, '').strip() line3 = next(file_object, '').strip() line4 = next(file_object, '').strip() yield (line1, line2, line3, line4) my_in_files = glob.glob ('E:/PINK/Paired_End/raw/gzip/*.fastq') for each in my_in_files: out = each.replace('/gzip', '/rem_clusters2' ) print (Reading File: + each) print (Writing File: + out) INFILE = open (each, 'r') OUTFILE = open (out , 'w') writes = 0 for reads, lines in four_lines( INFILE ): ID_Line_1, Seq_Line, ID_Line_2, Quality_Line = lines # Check that ID_Line_1 starts with @ if not ID_Line_1.startswith('@'): print (**ERROR**) print (expected ID_Line to start with @) print (lines) print (Read Number + str(Reads)) break elif Quality_Line != '+': print (**ERROR**) print (expected Quality_Line = +) print (lines) print (Read Number + str(Reads)) break # Select Reads that I want to keep ID = ID_Line_1.partition(' ') if (ID[2] == 1:N:0: or ID[2] == 2:N:0:): # Write to file, maintaining group of 4 OUTFILE.write(ID_Line_1 + \n) OUTFILE.write(Seq_Line + \n) OUTFILE.write(ID_Line_2 + \n) OUTFILE.write(Quality_Line + \n) writes += 1 # End of file reached, print update print (Saw, reads, groups of four lines) print (Wrote, writes, groups of four lines) INFILE.close() OUTFILE.close() -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options:
Re: [Tutor] string to binary and back... Python 3
On 19/07/2012 06:41, wolfrage8...@gmail.com wrote: On Thu, Jul 19, 2012 at 12:16 AM, Dave Angel d...@davea.name wrote: On 07/18/2012 05:07 PM, Jordan wrote: OK so I have been trying for a couple days now and I am throwing in the towel, Python 3 wins this one. I want to convert a string to binary and back again like in this question: Stack Overflow: Convert Binary to ASCII and vice versa (Python) http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-vice-versa-python But in Python 3 I consistently get some sort of error relating to the fact that nothing but bytes and bytearrays support the buffer interface or I get an overflow error because something is too large to be converted to bytes. Please help me and then explian what I am not getting that is new in Python 3. I would like to point out I realize that binary, hex, and encodings are all a very complex subject and so I do not expect to master it but I do hope that I can gain a deeper insight. Thank you all. test_script.py: import binascii test_int = 109 test_int = int(str(test_int) + '45670') data = 'Testing XOR Again!' while sys.getsizeof(data) test_int.bit_length(): test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), 'big'))) print('Bit Length: ' + str(test_int.bit_length())) key = test_int # Yes I know this is an unnecessary step... data = bin(int(binascii.hexlify(bytes(data, 'UTF-8')), 16)) print(data) data = int(data, 2) print(data) data = binascii.unhexlify('%x' % data) I don't get the same error you did. I get: File jordan.py, line 13 test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), 'big'))) ^ test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), \ 'big'))) # That was probably just do to the copy and paste. IndentationError: expected an indented block Please post it again, with correct indentation. if you used tabs, then expand them to spaces before pasting it into your test-mode mail editor. I only use spaces and this program did not require any indentation until it was pasted and the one line above became split across two line. Really though that was a trivial error to correct. Really? Are you using a forked version of Python that doesn't need indentation after a while loop, or are you speaking with a forked tongue? :) Strangely I believe the latter, so please take note of what Dave Angel has told you and post with the correct indentation. I'd also recommend you remove a lot of the irrelevant details there. if you have a problem with hexlfy and/or unhexlify, then give a simple byte string that doesn't work for you, and somebody can probably identify why not. And if you want people to run your code, include the imports as well. My problem is not specific to hexlify and unhexlify, my problem is trying to convert from string to binary and back. That is why all of the details, to show I have tried on my own. Sorry that I forgot to include sys and os for imports. As it is, you're apparently looping, comparing the byte memory size of a string (which is typically 4 bytes per character) with the number of significant bits in an unrelated number. I suspect what you want is something resembling (untested): mybytes = bytes( %x % data, ascii) newdata = binascii.unexlify(mybytes) I was comparing them but I think I understand how to compare them well, now I want to convert them both to binary so that I can XOR them together. Thank you for your time and help Dave, now I need to reply to Ramit. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor -- Cheers. Mark Lawrence. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Problem When Iterating Over Large Test Files
If you copy those files to a different device (one that has just been scrubbed and reformatted), then copy them back and get different results with your application, you've found your problem. -Bill Thanks for the insistence, I'll check this out. If you have any guidance on how to do so let me know. I knew my system wasn't particularly well suited to the task at hand, but I haven't seen how it would actually cause problems. -Ryan ___ The last two lines in my MSG pretty much would be the test. Get another flash drive, format it as FAT-32 (I assume that's what you are using), then copy a couple of files to it. Then copy them back to your current device and run your program again. If you get DIFFERENT, but still wrong results, you've found the problem. The largest positive integer a 32-bit binary number can represent is 2^32, which is 4Gig. I'm no expert on Window's files, but I'd be very surprised if when the FAT-32 file system was being designed, anyone considered the case where a single file could be that large. -Bill The hard-drive is formatted as NTFS, because as you say I'm up against the file size limit of FAT32 , do think this could still be the issue? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Problem When Iterating Over Large Test Files
I count only 19 lines. yep, you are right. My bad, I think I missing copy/pasting line 20. The first group has only three lines. See below. Not so, the first group is actually the first four lines listed below. Lines 1-4 serve as one group. For what it is worth, line four should have 1 character for each char in line 1, and the first line is much shorter, contains a space, and for this file always ends in either 1:N:0: (keep) 1Y0: (remove). The EXAMPLE data is correctly formatted as it should be, but I'm missing line 20. There is a blank line, which I take as NOT part of the input but just a spacer. Then: 1) Line starting with @ 2) Line of bases CGCGT ... 3) Plus sign 4) Line starting with @@@ 5) Line starting with @ 6) Line of bases TTCTA ... 7) Plus sign and so on. There are TWO lines before the first +, and three before each of the others. I think you are just reading one frame shifted, its not a well designed format because the required start character @, can appear other places as well __EXAMPLE RAW DATA FILE REGION__ @HWI-ST0747:167:B02DEACXX:8:1101:3182:167088 1:N:0: CGCGTGTGCAGGTTTATAGAACCAGCTGCAGATTAGTAGCAGCGCACGGAGAGGTGTGTCTGTTTATTGTCCTCAGCAGGCAGACATGTTTGTGGTC + @@@DDADDHB9+2A??:?G9+C)???G@DB@@DGFB0*?FF?0F:@/54'-;;?B;;6(5@CDAC(5(5:5,(8?88?BC@# @HWI-ST0747:167:B02DEACXX:8:1101:3134:167090 1:N:0: TTCTAGTGCAGGGCGACAGCGTTGCGGAGCCGGTCCGAGTCTGCTGGGTCAGTCATGGCTAGTTGGTACTATAACGACACAGGGCGAGACCCAGATGCAAA + @CCFFFDFHJJIJHHIIIJHGHIJI@GFFDDDFDDCEEEDCCBDCCCCCB@C(4@ADCA?BBBDDABB055-?AB1:@ACC: @HWI-ST0747:167:B02DEACXX:8:1101:3002:167092 1:N:0: CTTTGCTGCAGGCTCATCCTGACATGACCCTCCAGCATGACAATGCCACCAGCCATACTGCTCGTTCTGTGTGTGATTTCCAGCAAGTAAATATGTA + CCCFHIJIEHIH@AHFAGHIGIIGGEIJGIJIIIGIIIGEHGEHIIJIEHH@FHGH@=ACEHHFBFFCE@AACCACDB;;B?C3AADBA @HWI-ST0747:167:B02DEACXX:8:1101:3022:167094 1:N:0: ATTCCGTGCAGGCCAACTCCCGACGGACATCCTTGCTCAGACTGCAGCGATAGTGGTCGATCAGGGCCCTGTTGTTCCATCCCACTCCGGCGACCAGGTTC + CCCFHIDHJIIHIIIJIJIIGGIIFHJIIIIEIFHFFCBAECBDDDC:??B=AAACD?8@:C@?8CBDDD@D99B@3884A @HWI-ST0747:167:B02DEACXX:8:1101:3095:167100 1:N:0: CGTGATTGCAGGGACGTTACAGAGACGTTACAGGGATGTTACAGGGACGTTACAGAGACGTTAAAGAGATGTTACAGGGATGTTACAGACAGAGACGTTAC + Your code says that the first line in each group should start with an @ sign. That is clearly not the case for the last two groups. I suggest that your data files have been corrupted. I'm pretty sure that my raw IN files are all good, its hard to be sure with such a large file, but the very picky downstream analysis program takes every single raw file just fine (30 of them), and gaks on my filtered files, at regions that don't conform to the correct formatting. __PYTHON CODE __ I have re-written your code slightly, to be a little closer to best practice, or at least modern practice. If there is anything you don't understand, please feel free to ask. I haven't tested this code, but it should run fine on Python 2.7. It will be interesting to see if you get different results with this. --CODE REMOVED-- Thanks, for the suggestions. I've never really felt super comfortable using objects at all, but its what I want to learn next. This will be helpful, and useful. for reads, lines in four_lines( INFILE ): ID_Line_1, Seq_Line, ID_Line_2, Quality_Line = lines Can you explain what is going on here, or point me In the right direction? I see that the parts of 'lines' get assigned, but I'm missing how the file gets iterated over and how reads gets incremented. Do you have a reason why this approach might give a 'better' output? Thanks again. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Problem When Iterating Over Large Test Files
On 19/07/12 07:00, Steven D'Aprano wrote: def four_lines(file_object): snipping line1 = next(file_object).strip() # Get the next three lines, padding if needed. line2 = next(file_object, '').strip() line3 = next(file_object, '').strip() line4 = next(file_object, '').strip() yield (line1, line2, line3, line4) snipping... for reads, lines in four_lines( INFILE ): ID_Line_1, Seq_Line, ID_Line_2, Quality_Line = lines Shouldn't that be for reads, lines in enumerate( four_lines(INFILE) ): ID_Line_1, Seq_Line, ID_Line_2, Quality_Line = lines ? -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] suggestion for an editor
Friends, At present i write programs using vi editor. I am interested to change to something else. My specific need is that i want to select a portion/small segment of my program (for eg. a nested loop) and then monitor processing time it takes for that portion while i run the program. By this i hope to find the segment that takes time and modify to achieve better speed. Can someone please share their experience. Thanks, Bala -- C. Balasubramanian ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] suggestion for an editor
Try Sublime, On Thu, Jul 19, 2012 at 1:39 PM, Bala subramanian bala.biophys...@gmail.com wrote: Friends, At present i write programs using vi editor. I am interested to change to something else. My specific need is that i want to select a portion/small segment of my program (for eg. a nested loop) and then monitor processing time it takes for that portion while i run the program. By this i hope to find the segment that takes time and modify to achieve better speed. Can someone please share their experience. Thanks, Bala -- C. Balasubramanian ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor -- Cheers, Ranjith Kumar K, Chennai. http://ranjithtenz.wordpress.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/19/2012 01:41 AM, wolfrage8...@gmail.com wrote: On Thu, Jul 19, 2012 at 12:16 AM, Dave Angel d...@davea.name wrote: SNIP I don't get the same error you did. I get: File jordan.py, line 13 test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), 'big'))) ^ test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), \ 'big'))) # That was probably just do to the copy and paste. That was just the first line that was not indented. If I thought you had a one-line while loop, I certainly would have just indented it. But I'm sure you have some unknown number of additional lines that were indented in your original. Please post in text form. SNIP I'd also recommend you remove a lot of the irrelevant details there. if you have a problem with hexlfy and/or unhexlify, then give a simple byte string that doesn't work for you, and somebody can probably identify why not. And if you want people to run your code, include the imports as well. My problem is not specific to hexlify and unhexlify, my problem is trying to convert from string to binary and back. That is why all of the details, to show I have tried on my own. Sorry that I forgot to include sys and os for imports. Lots of details that have nothing to do with it. For example, that whole thing about adding random digits together. You could replace the whole thing with a simple assignment of a value that doesn't work for you. As it is, you're apparently looping, comparing the byte memory size of a string (which is typically 4 bytes per character) with the number of significant bits in an unrelated number. I suspect what you want is something resembling (untested): mybytes = bytes( %x % data, ascii) newdata = binascii.unexlify(mybytes) I was comparing them but I think I understand how to compare them well, now I want to convert them both to binary so that I can XOR them together. Thank you for your time and help Dave, now I need to reply to Ramit. Ah, so you don't actually want binary at all!!! Why not state the real problem up front? You can XOR two integers, without bothering to convert to a string of ones and zeroes. Use the carat operator. print( 40 ^ 12) I suspect there's an equivalent for strings or byte-strings. But if not, it's a simple loop. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
I'll preface my response by saying that I know/understand fairly little about it, but since I've recently been smacked by this same issue when converting stuff to Python3, I'll see if I can explain it in a way that makes sense. On Wed, 18 Jul 2012, Jordan wrote: OK so I have been trying for a couple days now and I am throwing in the towel, Python 3 wins this one. I want to convert a string to binary and back again like in this question: Stack Overflow: Convert Binary to ASCII and vice versa (Python) http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-vice-versa-python But in Python 3 I consistently get some sort of error relating to the fact that nothing but bytes and bytearrays support the buffer interface or I get an overflow error because something is too large to be converted to bytes. Please help me and then explian what I am not getting that is new in Python 3. I would like to point out I realize that binary, hex, and encodings are all a very complex subject and so I do not expect to master it but I do hope that I can gain a deeper insight. Thank you all. The way I've read it - stop thinking about strings as if they are text. The biggest reason that all this has changed is because Python has grown up and entered the world where Unicode actually matters. To us poor shmucks in the English speaking countries of the world it's all very confusing becaust it's nothing we have to deal with. 26 letters is perfectly fine for us - and if we want uppercase we'll just throw another 26. Add a few dozen puncuation marks and 256 is a perfectly fine amount of characters. To make a slightly relevant side trip, when you were a kid did you ever send secret messages to a friend with a code like this? A = 1 B = 2 . . . Z = 26 Well, that's basically what is going on when it comes to bytes/text/whatever. When you input some text, Python3 believes that whatever you wrote was encoded with Unicode. The nice thing for us 26-letter folks is that the ASCII alphabet we're so used to just so happens to map quite well to Unicode encodings - so 'A' in ASCII is the same number as 'A' in utf-8. Now, here's the part that I had to (and still need to) wrap my mind around - if the string is just bytes then it doesn't really matter what the string is supposed to represent. It could represent the LATIN-1 character set. Or UTF-8, -16, or some other weird encoding. And all the operations that are supposed to modify these strings of bytes (e.g. removing spaces, splitting on a certain character, etc.) still work. Because if I have this string: 9 45 12 9 13 19 18 9 12 99 102 and I tell you to split on the 9's, it doesn't matter if that's some weird ASCII character, or some equally weird UTF character, or something else entirely. And I don't have to worry about things getting munged up when I try to stick Unicode and ASCII values together - because they're converted to bytes first. So the question is, of course, if it's all bytes, then why does it look like text when I print it out? Well, that's because Python converts that byte stream to Unicode text when it's printed. Or ASCII, if you tell it to. But Python3 has converted all(?) of those functions that used to operate on text and made them operate on byte streams instead. Except for the ones that operate on text ;) Well, I hope that's of some use and isn't too much of a lie - like I said, I'm still trying to wrap my head around things and I've found that explaining (or trying to explain) to someone else is often the best way to work out the idea in your own head. If I've gone too far astray I'm sure the other helpful folks here will correct me :) HTH, Wayne ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Problem When Iterating Over Large Test Files
Just a few notes... On Wed, 18 Jul 2012, Ryan Waples wrote: snip import glob my_in_files = glob.glob ('E:/PINK/Paired_End/raw/gzip/*.fastq') for each in my_in_files: #print(each) out = each.replace('/gzip', '/rem_clusters2' ) #print (out) INFILE = open (each, 'r') OUTFILE = open (out , 'w') It's slightly confusing to see your comments left-aligned instead of with the code they refer to. At first glance it looked as though your block ended here, when it does, in fact, continue. # Tracking Variables Reads = 0 Writes = 0 Check_For_End_Of_File = 0 #Updates print (Reading File: + each) print (Writing File: + out) # Read FASTQ File by group of four lines while Check_For_End_Of_File == 0: This is Python, not C - checking for EOF is probably silly (unless you're really checking for end of data) - you can just do: for line in INFILE: ID_Line_1 = line Seq_line = next(INFILE) # Replace with INFILE.next() for Python2 ID_Line_2 = next(INFILE) Quality_Line = next(INFILE) # Read the next four lines from the FASTQ file ID_Line_1 = INFILE.readline() Seq_Line= INFILE.readline() ID_Line_2 = INFILE.readline() Quality_Line= INFILE.readline() # Strip off leading and trailing whitespace characters ID_Line_1 = ID_Line_1.strip() Seq_Line= Seq_Line.strip() ID_Line_2 = ID_Line_2.strip() Quality_Line= Quality_Line.strip() Also, it's just extra clutter to call strip like this when you can just tack it on to your original statement: for line in INFILE: ID_Line_1 = line.strip() Seq_line = next(INFILE).strip() # Replace with INFILE.next() for Python2 ID_Line_2 = next(INFILE).strip() Quality_Line = next(INFILE).strip() Reads = Reads + 1 #Check that I have not reached the end of file if Quality_Line == : #End of file reached, print update print (Saw + str(Reads) + reads) print (Wrote + str(Writes) + reads) Check_For_End_Of_File = 1 break This break is superfluous - it will actually remove you from the while loop - no further lines of code will be evaluated, including the original `while` comparison. You can also just test the Quality_Line for truthiness directly, since empty string evaluate to false. I would actually just say: if Quality_Line: #Do the rest of your stuff here #Check that ID_Line_1 starts with @ if not ID_Line_1.startswith('@'): print (**ERROR**) print (each) print (Read Number + str(Reads)) print ID_Line_1 + ' does not start with @' break #ends the while loop # Select Reads that I want to keep ID = ID_Line_1.partition(' ') if (ID[2] == 1:N:0: or ID[2] == 2:N:0:): # Write to file, maintaining group of 4 OUTFILE.write(ID_Line_1 + \n) OUTFILE.write(Seq_Line + \n) OUTFILE.write(ID_Line_2 + \n) OUTFILE.write(Quality_Line + \n) Writes = Writes +1 INFILE.close() OUTFILE.close() You could (as long as you're on 2.6 or greater) just use the `with` block for reading the files then you don't need to worry about closing - the block takes care of that, even on errors: for each in my_in_files: out = each.replace('/gzip', '/rem_clusters2' ) with open (each, 'r') as INFILE, open (out, 'w') as OUTFILE: for line in INFILE: # Do your work here... A few stylistic points: ALL_CAPS are usually reserved for constants - infile and outfile are perfectly legitimate names. Caps_In_Variable_Names are usually discouraged. Class names should be CamelCase (e.g. SimpleHTTPServer), while variable names should be lowercase with underscores if needed, so id_line_1 instead of ID_Line_1. If you're using Python3 or from __future__ import print_function, rather than doing OUTFILE.write(value + '\n') you can do: print(value, file=OUTFILE) Then you get the \n for free. You could also just do: print(val1, val2, val3, sep='\n', end='\n', file=OUTFILE) The end parameter is there for example only, since the default value for end is '\n' HTH, Wayne ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On Thu, Jul 19, 2012 at 1:41 AM, wolfrage8...@gmail.com wolfrage8...@gmail.com wrote: I was comparing them but I think I understand how to compare them well, now I want to convert them both to binary so that I can XOR them together. Thank you for your time and help Dave, now I need to reply to Ramit. A bytes object is a container of 8-bit numbers (i.e. range 0 to 255). If you index it, you'll get an int that supports the XOR operation: b1 = b'a' b2 = b'b' b1[0] 97 b2[0] 98 bin(b1[0]) '0b111' bin(b2[0]) '0b1100010' bin(b1[0] ^ b2[0]) '0b11' You can use the int method from_bytes to XOR two bitstrings stored as Python bytes: b3 = b'' b4 = b'' bin(int.from_bytes(b3, 'big') ^ int.from_bytes(b4, 'big')) '0b11001100110011' The computation is done between int objects, not strings. Creating a string using bin is just for presentation. P.S.: Instead of bin you can use the format command to have more control, such as for zero padding. The integer format code b is for a binary representation. Preceding it by a number starting with zero will pad with zeros to the given number of characters (e.g. 032 will prepend zeros to make the result at least 32 characters long): r = int.from_bytes(b3, 'big') ^ int.from_bytes(b4, 'big') format(r, 032b) '0011001100110011' Instead of hard coding the length (e.g. 032), you can use the length of the input bitstrings to calculate the size of the result: size = 8 * max(len(b3), len(b4)) format(r, 0%db % size) '0011001100110011' ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] suggestion for an editor
On Thu, 19 Jul 2012, Bala subramanian wrote: Friends, At present i write programs using vi editor. I am interested to change to something else. My specific need is that i want to select a portion/small segment of my program (for eg. a nested loop) and then monitor processing time it takes for that portion while i run the program. By this i hope to find the segment that takes time and modify to achieve better speed. Can someone please share their experience. I'm not sure how vi has anything to do with the speed of your program(!) For performance measurements you should look into the Timeit module. How long does it take your program to run currently? After all, premature optimisation is the root of all evil... -Wayne ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Pragmatic Unicode, or, How do I stop the pain?
https://www.youtube.com/watch?v=sgHbC6udIqc This is a very good talk on Unicode which was done at PyCon US 2012. It helped me a lot to understand the pain. Greets Sander ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Flatten a list in tuples and remove doubles
Hi all, I would get a new list as: [(0, '3eA', 'Dupont', 'Juliette', '11.0/10.0', '4.0/5.0', '17.5/30.0', '3.0/5.0', '4.5/10.0', '35.5/60.0'), (1, '3eA', 'Pop', 'Iggy', '12.0/10.0', '3.5/5.0', '11.5/30.0', '4.0/5.0', '5.5/10.0', '7.5/10.0', '40.5/60.0')] ... from this one: [(0, '3eA', 'Dupont', 'Juliette', 0, 11.0, 10.0), (0, '3eA', 'Dupont', 'Juliette', 1, 4.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 2, 17.5, 30.0), (0, '3eA', 'Dupont', 'Juliette', 3, 3.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 4, 4.5, 10.0), (0, '3eA', 'Dupont', 'Juliette', 5, 35.5, 60.0), (1, '3eA', 'Pop', 'Iggy', 0, 12.0, 10.0), (1, '3eA', 'Pop', 'Iggy', 1, 3.5, 5.0), (1, '3eA', 'Pop', 'Iggy', 2, 11.5, 30.0), (1, '3eA', 'Pop', 'Iggy', 3, 4.0, 5.0), (1, '3eA', 'Pop', 'Iggy', 4, 5.5, 10.0), (1, '3eA', 'Pop', 'Iggy', 5, 40.5, 60.0)] How to make that ? I'm looking for but for now I can't do it. Thanks in advance. a+ -- http://ekd.tuxfamily.org http://ekdm.wordpress.com http://glouk.legtux.org/guiescputil http://lcs.dunois.clg14.ac-caen.fr/~alama/blog http://lprod.org/wiki/doku.php/video:encodage:avchd_converter ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Flatten a list in tuples and remove doubles
Oh I forgot to mention, with Python 2 (2.7). -- http://ekd.tuxfamily.org http://ekdm.wordpress.com http://glouk.legtux.org/guiescputil http://lcs.dunois.clg14.ac-caen.fr/~alama/blog http://lprod.org/wiki/doku.php/video:encodage:avchd_converter ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] suggestion for an editor
On 19/07/12 09:09, Bala subramanian wrote: Friends, At present i write programs using vi editor. I am interested to change to something else. My specific need is that i want to select a portion/small segment of my program (for eg. a nested loop) and then monitor processing time it takes for that portion while i run the program. I suspect its not a new editor you need but the profile module... Take a look at its documentation. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Flatten a list in tuples and remove doubles
You may use 'set'. e.g. === x [(1, 2, 3), (1, 1), (2, 2), (1, 1), (2, 2)] set(x) set([(2, 2), (1, 1), (1, 2, 3)]) === On 19-Jul-2012, at 11:03 PM, PyProg PyProg wrote: Hi all, I would get a new list as: [(0, '3eA', 'Dupont', 'Juliette', '11.0/10.0', '4.0/5.0', '17.5/30.0', '3.0/5.0', '4.5/10.0', '35.5/60.0'), (1, '3eA', 'Pop', 'Iggy', '12.0/10.0', '3.5/5.0', '11.5/30.0', '4.0/5.0', '5.5/10.0', '7.5/10.0', '40.5/60.0')] ... from this one: [(0, '3eA', 'Dupont', 'Juliette', 0, 11.0, 10.0), (0, '3eA', 'Dupont', 'Juliette', 1, 4.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 2, 17.5, 30.0), (0, '3eA', 'Dupont', 'Juliette', 3, 3.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 4, 4.5, 10.0), (0, '3eA', 'Dupont', 'Juliette', 5, 35.5, 60.0), (1, '3eA', 'Pop', 'Iggy', 0, 12.0, 10.0), (1, '3eA', 'Pop', 'Iggy', 1, 3.5, 5.0), (1, '3eA', 'Pop', 'Iggy', 2, 11.5, 30.0), (1, '3eA', 'Pop', 'Iggy', 3, 4.0, 5.0), (1, '3eA', 'Pop', 'Iggy', 4, 5.5, 10.0), (1, '3eA', 'Pop', 'Iggy', 5, 40.5, 60.0)] How to make that ? I'm looking for but for now I can't do it. Thanks in advance. a+ -- http://ekd.tuxfamily.org http://ekdm.wordpress.com http://glouk.legtux.org/guiescputil http://lcs.dunois.clg14.ac-caen.fr/~alama/blog http://lprod.org/wiki/doku.php/video:encodage:avchd_converter ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Flatten a list in tuples and remove doubles
I would get a new list as: [(0, '3eA', 'Dupont', 'Juliette', '11.0/10.0', '4.0/5.0', '17.5/30.0', '3.0/5.0', '4.5/10.0', '35.5/60.0'), (1, '3eA', 'Pop', 'Iggy', '12.0/10.0', '3.5/5.0', '11.5/30.0', '4.0/5.0', '5.5/10.0', '7.5/10.0', '40.5/60.0')] ... from this one: [(0, '3eA', 'Dupont', 'Juliette', 0, 11.0, 10.0), (0, '3eA', 'Dupont', 'Juliette', 1, 4.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 2, 17.5, 30.0), (0, '3eA', 'Dupont', 'Juliette', 3, 3.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 4, 4.5, 10.0), (0, '3eA', 'Dupont', 'Juliette', 5, 35.5, 60.0), (1, '3eA', 'Pop', 'Iggy', 0, 12.0, 10.0), (1, '3eA', 'Pop', 'Iggy', 1, 3.5, 5.0), (1, '3eA', 'Pop', 'Iggy', 2, 11.5, 30.0), (1, '3eA', 'Pop', 'Iggy', 3, 4.0, 5.0), (1, '3eA', 'Pop', 'Iggy', 4, 5.5, 10.0), (1, '3eA', 'Pop', 'Iggy', 5, 40.5, 60.0)] How to make that ? I'm looking for but for now I can't do it. Well first thing to do would be to describe the logic behind what you are doing. Without knowing that it is difficult to come up with the correct solution. I am guessing you want a list where fields 1,2,3 (based on the first element being field 0) of field a string of `field 5 + '/' + field 6`. But that does not tell me what field 0 should be in the new format. This is a pretty crude sample that should work for you. lookup = {} for row in old_list: key = (row[1],row[2],row[3]) field_0, ratios = lookup.setdefault( key, (row[0], []) ) ratios.append( '{0}/{1}'.format( row[5], row[6] ) ) new_list = [] for key, value in lookup.items(): row = [ value[0], key[0], key[1], key[2] ] row.extend(value[1]) new_list.append(tuple(row)) # Might need to sort to get the exact same results Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/19/2012 12:15 AM, Prasad, Ramit wrote: SNIP I think your basic problem is too much conversion because you do not understand the types. A string is represented by a series of bytes which are binary numbers. Do you understand the concept behind ASCII? Each letter has a numeric representation that are sequential. So the string 'abcd' is equivalent to a series of bytes 65,66,67,68. It is not equivalent to 65666768 or 65+66+67+68. So your first task is to convert each character to the numeric equivalent and store them in a list. Once you have them converted to a list of integers, you can create another list that is a list of characters. Sorry for the long delay in getting back to you, I got called to the field. Thank you, I agree I do feel like I am doing too much conversion. I do understand the concept behind ASCII at least enough to know about ord() although I did for get about chr() which is ord()'s reverse function. I had tried to break them down to the ordinal value, but I really do want to get the integer and the data down to binary, as it provides an advantage for the overall program that I am writing. Thank you for your time. Look at the functions chr and ord here ( http://docs.python.org/py3k/library/functions.html ) Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology 712 Main Street | Houston, TX 77002 work phone: 713 - 216 - 5423 -- This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/19/2012 08:14 AM, Mark Lawrence wrote: On 19/07/2012 06:41, wolfrage8...@gmail.com wrote: On Thu, Jul 19, 2012 at 12:16 AM, Dave Angel d...@davea.name wrote: SNIP Really? Are you using a forked version of Python that doesn't need indentation after a while loop, or are you speaking with a forked tongue? :) Strangely I believe the latter, so please take note of what Dave Angel has told you and post with the correct indentation. http://www.101emailetiquettetips.com/ Number 101 is for you. Good day. I'd also recommend you remove a lot of the irrelevant details there. if you have a problem with hexlfy and/or unhexlify, then give a simple byte string that doesn't work for you, and somebody can probably identify why not. And if you want people to run your code, include the imports as well. My problem is not specific to hexlify and unhexlify, my problem is trying to convert from string to binary and back. That is why all of the details, to show I have tried on my own. Sorry that I forgot to include sys and os for imports. As it is, you're apparently looping, comparing the byte memory size of a string (which is typically 4 bytes per character) with the number of significant bits in an unrelated number. I suspect what you want is something resembling (untested): mybytes = bytes( %x % data, ascii) newdata = binascii.unexlify(mybytes) I was comparing them but I think I understand how to compare them well, now I want to convert them both to binary so that I can XOR them together. Thank you for your time and help Dave, now I need to reply to Ramit. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Fwd: string to binary and back... Python 3
On 07/19/2012 10:29 AM, Walter Prins wrote: Hi, Just to show you your original message contained no indentation whatsoever. You might want to check your mail client settings and do some experiments to make sure that indentation spaces are let through unmolested and not stripped anywhere, otherwise the current little brouhaha about formatting will result. You have to admit, it's not easy to read the code below with zero indentation present... :) Thank you for pointing that out, I did not realize it as I had copied and pasted it from the python file I was working on. I guess Thunderbird edited the email on me, even though I had put it into plain text mode. Next time perhaps I will just attach the file if that is acceptable rather than getting attacked for what my mail editor did. Regards Walter -- Forwarded message -- From: Jordan wolfrage8...@gmail.com Date: 18 July 2012 22:07 Subject: [Tutor] string to binary and back... Python 3 To: tutor@python.org OK so I have been trying for a couple days now and I am throwing in the towel, Python 3 wins this one. I want to convert a string to binary and back again like in this question: Stack Overflow: Convert Binary to ASCII and vice versa (Python) http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-vice-versa-python But in Python 3 I consistently get some sort of error relating to the fact that nothing but bytes and bytearrays support the buffer interface or I get an overflow error because something is too large to be converted to bytes. Please help me and then explian what I am not getting that is new in Python 3. I would like to point out I realize that binary, hex, and encodings are all a very complex subject and so I do not expect to master it but I do hope that I can gain a deeper insight. Thank you all. test_script.py: import binascii test_int = 109 test_int = int(str(test_int) + '45670') data = 'Testing XOR Again!' while sys.getsizeof(data) test_int.bit_length(): test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), 'big'))) print('Bit Length: ' + str(test_int.bit_length())) key = test_int # Yes I know this is an unnecessary step... data = bin(int(binascii.hexlify(bytes(data, 'UTF-8')), 16)) print(data) data = int(data, 2) print(data) data = binascii.unhexlify('%x' % data) wolfrage@lm12-laptop02 ~/Projects $ python3 test_script.py Bit Length: 134 0b1010100011001010111001101110100011010010110111001100111001001011100010100100010010101100111011101101001011011100011 7351954002991226380810260999848996570230305 Traceback (most recent call last): File test_script.py, line 24, in module data = binascii.unhexlify('%x' % data) TypeError: 'str' does not support the buffer interface test_script2.py: import binascii test_int = 109 test_int = int(str(test_int) + '45670') data = 'Testing XOR Again!' while sys.getsizeof(data) test_int.bit_length(): test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), 'big'))) print('Bit Length: ' + str(test_int.bit_length())) key = test_int # Yes I know this is an unnecessary step... data = bin(int(binascii.hexlify(bytes(data, 'UTF-8')), 16)) print(data) data = int(data, 2) print(data) data = binascii.unhexlify(bytes(data, 'utf8')) wolfrage@lm12-laptop02 ~/Projects $ python3 test_script2.py Bit Length: 140 0b1010100011001010111001101110100011010010110111001100111001001011100010100100010010101100111011101101001011011100011 7351954002991226380810260999848996570230305 Traceback (most recent call last): File test_script.py, line 24, in module data = binascii.unhexlify(bytes(data, 'utf8')) OverflowError: cannot fit 'int' into an index-sized integer ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Fwd: string to binary and back... Python 3
Just to show you your original message contained no indentation whatsoever. You might want to check your mail client settings and do some experiments to make sure that indentation spaces are let through unmolested and not stripped anywhere, otherwise the current little brouhaha about formatting will result. You have to admit, it's not easy to read the code below with zero indentation present... :) Thank you for pointing that out, I did not realize it as I had copied and pasted it from the python file I was working on. I guess Thunderbird edited the email on me, even though I had put it into plain text mode. Next time perhaps I will just attach the file if that is acceptable rather than getting attacked for what my mail editor did. A fair amount of the list does not get attachments as this is a gateway to newsgroups. Copy/paste works for short code fragments if you are sure you are posting in plain text. Otherwise, you can post to services like pastebin and link to that. Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/19/2012 12:46 PM, Dave Angel wrote: On 07/19/2012 01:41 AM, wolfrage8...@gmail.com wrote: On Thu, Jul 19, 2012 at 12:16 AM, Dave Angel d...@davea.name wrote: SNIP That was just the first line that was not indented. If I thought you had a one-line while loop, I certainly would have just indented it. But I'm sure you have some unknown number of additional lines that were indented in your original. Please post in text form. I see now, I am sorry I did not know that Thunderbird had eliminated all of my indentation. I had set it to plain text, but I guess it is not to be trusted. I simply looked at the reply email and saw the line that you pointed out and at that time figured Thunderbird had just word wrapped that line on me. Would it be acceptable to add an attached Python file? SNIP Lots of details that have nothing to do with it. For example, that whole thing about adding random digits together. You could replace the whole thing with a simple assignment of a value that doesn't work for you. OK I will, that was a test script and I was testing multiple things, I did try to get rid of most of the cruft but I will attempt to do better in the future. SNIP now I want to convert them both to binary so that I can XOR them together. Thank you for your time and help Dave, now I need to reply to Ramit. Ah, so you don't actually want binary at all!!! Why not state the real problem up front? You can XOR two integers, without bothering to convert to a string of ones and zeroes. Use the carat operator. print( 40 ^ 12) I suspect there's an equivalent for strings or byte-strings. But if not, it's a simple loop. Actually I do want binary, as it serves as an advantage for the overall program that I am building. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
I think your basic problem is too much conversion because you do not understand the types. A string is represented by a series of bytes which are binary numbers. Do you understand the concept behind ASCII? Each letter has a numeric representation that are sequential. So the string 'abcd' is equivalent to a series of bytes 65,66,67,68. It is not equivalent to 65666768 or 65+66+67+68. So your first task is to convert each character to the numeric equivalent and store them in a list. Once you have them converted to a list of integers, you can create another list that is a list of characters. Sorry for the long delay in getting back to you, I got called to the field. Thank you, I agree I do feel like I am doing too much conversion. I do understand the concept behind ASCII at least enough to know about ord() although I did for get about chr() which is ord()'s reverse function. I had tried to break them down to the ordinal value, but I really do want to get the integer and the data down to binary, as it provides an advantage for the overall program that I am writing. Thank you for your time. Why not explain your usecase? Technically, everything is binary on a computer so the question is why do *you* need to see the binary form? Anyway, you can get the binary string by doing `bin(ord(character))` and reverse it by doing `chr(int(binary_string,2))`. [1] If you are doing some kind of XOR (I think your first email mentioned it) then you can XOR integers. Unless you are doing some kind of display of binary output, you usually do not need to actually see the binary string as most binary manipulation can be done via the integer value. [1] - Thanks to Steven. Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
My response is down lower, thank you Wayne. On 07/19/2012 12:52 PM, Wayne Werner wrote: I'll preface my response by saying that I know/understand fairly little about it, but since I've recently been smacked by this same issue when converting stuff to Python3, I'll see if I can explain it in a way that makes sense. On Wed, 18 Jul 2012, Jordan wrote: OK so I have been trying for a couple days now and I am throwing in the towel, Python 3 wins this one. I want to convert a string to binary and back again like in this question: Stack Overflow: Convert Binary to ASCII and vice versa (Python) http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-vice-versa-python But in Python 3 I consistently get some sort of error relating to the fact that nothing but bytes and bytearrays support the buffer interface or I get an overflow error because something is too large to be converted to bytes. Please help me and then explian what I am not getting that is new in Python 3. I would like to point out I realize that binary, hex, and encodings are all a very complex subject and so I do not expect to master it but I do hope that I can gain a deeper insight. Thank you all. The way I've read it - stop thinking about strings as if they are text. The biggest reason that all this has changed is because Python has grown up and entered the world where Unicode actually matters. To us poor shmucks in the English speaking countries of the world it's all very confusing becaust it's nothing we have to deal with. 26 letters is perfectly fine for us - and if we want uppercase we'll just throw another 26. Add a few dozen puncuation marks and 256 is a perfectly fine amount of characters. To make a slightly relevant side trip, when you were a kid did you ever send secret messages to a friend with a code like this? A = 1 B = 2 . . . Z = 26 Well, that's basically what is going on when it comes to bytes/text/whatever. When you input some text, Python3 believes that whatever you wrote was encoded with Unicode. The nice thing for us 26-letter folks is that the ASCII alphabet we're so used to just so happens to map quite well to Unicode encodings - so 'A' in ASCII is the same number as 'A' in utf-8. Now, here's the part that I had to (and still need to) wrap my mind around - if the string is just bytes then it doesn't really matter what the string is supposed to represent. It could represent the LATIN-1 character set. Or UTF-8, -16, or some other weird encoding. And all the operations that are supposed to modify these strings of bytes (e.g. removing spaces, splitting on a certain character, etc.) still work. Because if I have this string: 9 45 12 9 13 19 18 9 12 99 102 and I tell you to split on the 9's, it doesn't matter if that's some weird ASCII character, or some equally weird UTF character, or something else entirely. And I don't have to worry about things getting munged up when I try to stick Unicode and ASCII values together - because they're converted to bytes first. So the question is, of course, if it's all bytes, then why does it look like text when I print it out? Well, that's because Python converts that byte stream to Unicode text when it's printed. Or ASCII, if you tell it to. But Python3 has converted all(?) of those functions that used to operate on text and made them operate on byte streams instead. Except for the ones that operate on text ;) Well, I hope that's of some use and isn't too much of a lie - like I said, I'm still trying to wrap my head around things and I've found that explaining (or trying to explain) to someone else is often the best way to work out the idea in your own head. If I've gone too far astray I'm sure the other helpful folks here will correct me :) Thank you for the vary informative post, every bit helps. It has certainly been a challenge for me with the new everything is bytes scheme, especially how everything has to be converted to bytes prior to going on a buffer. HTH, Wayne ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
A question I have for the group before I respond is a option that I saw that I had earlier was to ord() each element of a string and then bin() that number. But since bin() produces a string I could not figure out the correct way to attach two bin() outputs back together again due to the leading 'b' and even if I use lstrip('b') I was not sure if that would be correct? My next hesitation is can the same or at least similar techniques be applied to a file? I want to be able to work on both files and strings. On 07/19/2012 01:22 PM, eryksun wrote: On Thu, Jul 19, 2012 at 1:41 AM, wolfrage8...@gmail.com wolfrage8...@gmail.com wrote: I was comparing them but I think I understand how to compare them well, now I want to convert them both to binary so that I can XOR them together. Thank you for your time and help Dave, now I need to reply to Ramit. A bytes object is a container of 8-bit numbers (i.e. range 0 to 255). If you index it, you'll get an int that supports the XOR operation: b1 = b'a' b2 = b'b' b1[0] 97 b2[0] 98 bin(b1[0]) '0b111' bin(b2[0]) '0b1100010' bin(b1[0] ^ b2[0]) '0b11' You can use the int method from_bytes to XOR two bitstrings stored as Python bytes: b3 = b'' b4 = b'' bin(int.from_bytes(b3, 'big') ^ int.from_bytes(b4, 'big')) '0b11001100110011' The computation is done between int objects, not strings. Creating a string using bin is just for presentation. P.S.: Instead of bin you can use the format command to have more control, such as for zero padding. The integer format code b is for a binary representation. Preceding it by a number starting with zero will pad with zeros to the given number of characters (e.g. 032 will prepend zeros to make the result at least 32 characters long): The control sounds good and I may need that latter (To adjust things to a fixed length), but for the purpose of XORing a message padding a key with zeros would not be desirable if Eve was able to get her hands on the source code. r = int.from_bytes(b3, 'big') ^ int.from_bytes(b4, 'big') format(r, 032b) '0011001100110011' Instead of hard coding the length (e.g. 032), you can use the length of the input bitstrings to calculate the size of the result: That sounds good. size = 8 * max(len(b3), len(b4)) format(r, 0%db % size) '0011001100110011' Is this output the output for size rather than the two variables joined together? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/19/2012 08:53 PM, Prasad, Ramit wrote: I think your basic problem is too much conversion because you do not understand the types. A string is represented by a series of bytes which are binary numbers. Do you understand the concept behind ASCII? Each letter has a numeric representation that are sequential. So the string 'abcd' is equivalent to a series of bytes 65,66,67,68. It is not equivalent to 65666768 or 65+66+67+68. So your first task is to convert each character to the numeric equivalent and store them in a list. Once you have them converted to a list of integers, you can create another list that is a list of characters. Sorry for the long delay in getting back to you, I got called to the field. Thank you, I agree I do feel like I am doing too much conversion. I do understand the concept behind ASCII at least enough to know about ord() although I did for get about chr() which is ord()'s reverse function. I had tried to break them down to the ordinal value, but I really do want to get the integer and the data down to binary, as it provides an advantage for the overall program that I am writing. Thank you for your time. Why not explain your usecase? Technically, everything is binary on a computer so the question is why do *you* need to see the binary form? Anyway, you can get the binary string by doing `bin(ord(character))` and reverse it by doing `chr(int(binary_string,2))`. [1] OK. I am using one time pads to XOR data, but the one time pads (keys) are very large numbers, converting them to binary increases their size exponentially, which allows me to get more XORing done out of a single key. I am XORing both files and strings so I need to have code that can do both even if that means two branches of code via an if/else perhaps with an isinstance(data, str). I do not need to actually see the binary form. If you are doing some kind of XOR (I think your first email mentioned it) then you can XOR integers. Unless you are doing some kind of display of binary output, you usually do not need to actually see the binary string as most binary manipulation can be done via the integer value. Agreed. Although the visual does help for validation (seeing is believing). [1] - Thanks to Steven. Yes thank you Steven. I am working on the code now to see if I can make the above work for me, if I need further help I will be back. Thank you all again for your time. Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
A question I have for the group before I respond is a option that I saw that I had earlier was to ord() each element of a string and then bin() that number. But since bin() produces a string I could not figure out the correct way to attach two bin() outputs back together again due to the leading 'b' and even if I use lstrip('b') I was not sure if that would be correct? bin(integer).split('b')[1].zfill( multiple_of_eight ) My next hesitation is can the same or at least similar techniques be applied to a file? I want to be able to work on both files and strings. Probably, but it depends on what you are trying to do and what data you are dealing with. Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] check against multiple variables
I am using a hash table in a small randomization program. I know that some hash functions can be prone to collisions, so I need a way to detect collisions. The 'hash value' will be stored as a variable. I do not want to check it against each singular hash value, as there will be many; I need a way to check it against all hash values at once (if possible.) Sorry for those who like to reference, but there is no source code as of yet. I will need this to be solved before I can start writing, sorry! If you need any extra info let me know. -Selby ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] check against multiple variables
On 7/19/2012 12:29 PM Selby Rowley-Cannon said... I am using a hash table in a small randomization program. I know that some hash functions can be prone to collisions, so I need a way to detect collisions. The 'hash value' will be stored as a variable. I do not want to check it against each singular hash value, as there will be many; I need a way to check it against all hash values at once (if possible.) so keeping the hash values in a dict would allow you to test as follows: if new_hash_value in dict_of_hash_values: # and bob's your uncle. Emile Sorry for those who like to reference, but there is no source code as of yet. I will need this to be solved before I can start writing, sorry! If you need any extra info let me know. -Selby ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/19/2012 03:19 PM, Jordan wrote: SNIP OK. I am using one time pads to XOR data, but the one time pads (keys) are very large numbers, converting them to binary increases their size exponentially, which allows me to get more XORing done out of a single You want to explain this impossibility of increasing size exponentially? If you're wanting to waste memory, there are better ways. But it's only 8 times as big to save a string of 1's and zeros as to save the large-int they represent. And multiplying by 8 isn't an exponential function. key. I am XORing both files and strings so I need to have code that can do both even if that means two branches of code via an if/else perhaps with an isinstance(data, str). I do not need to actually see the binary form. Then don't use the binary form. It doesn't make the computation any more powerful and it'll certainly slow it down. Are you trying to match some other program's algorithm, and thus have strange constraints on your data? Or are you simply trying to make a secure way to encrypt binary files, using one-time pads? A one-time pad is the same size as the message, so you simply need to convert the message into a large-int, and xor them. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/19/2012 09:23 PM, Prasad, Ramit wrote: A question I have for the group before I respond is a option that I saw that I had earlier was to ord() each element of a string and then bin() that number. But since bin() produces a string I could not figure out the correct way to attach two bin() outputs back together again due to the leading 'b' and even if I use lstrip('b') I was not sure if that would be correct? bin(integer).split('b')[1].zfill( multiple_of_eight ) OK so using this: Hopefully my copy paste works this time. bin_data = '' for char in data: bin_data += bin(ord(char)).split('b')[1].zfill(8) print(bin_data) bin_list = [bin_data[x:x + 2] for x in range(0, len(bin_data), 2)] print(bin_list) The paste looks good to me at this time. How do I get back to the string? If I use this: data2 = [] for item in bin_list: data2.append(int(item, 2)) print(data2) The output is all too low of numbers for ord() to convert back to the correct string. My next hesitation is can the same or at least similar techniques be applied to a file? I want to be able to work on both files and strings. Probably, but it depends on what you are trying to do and what data you are dealing with. I just want to perform the same conversion on the file data, that is down to binary and back to it's original state. I was thinking I would just use the file in binary mode when I open it, but I am not sure if that is true binary or if it is hex or something else altogether. I think my confusion came from trying to do both files and strings at the same time and failing back and forth. Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On Thu, Jul 19, 2012 at 3:08 PM, Jordan wolfrage8...@gmail.com wrote: size = 8 * max(len(b3), len(b4)) format(r, 0%db % size) '0011001100110011' Is this output the output for size rather than the two variables joined together? Using format is useful if you need the string to be padded with zeros for the most significant byte. I wouldn't think it's important if you're just using the bitstring representation as a sanity check on your algorithm. In that case you can more easily use bin. That said, len(b3) is the number of characters (bytes) in the bytes object. Since b3 and b4 could be different lengths in general, I took the max length to use for the zero padding. In this case both b3 and b4 contain 4 bytes, so size is 32. OK. I am using one time pads to XOR data, but the one time pads (keys) are very large numbers, converting them to binary increases their size exponentially, which allows me to get more XORing done out of a single key. I am XORing both files and strings so I need to have code that can do both even if that means two branches of code via an if/else perhaps with an isinstance(data, str). I'm not an expert with cryptography, but here's a simple XOR example: from itertools import cycle text = b'Mary had a little lamb.' key = b'1234' cypher = bytes(x^y for x,y in zip(text, cycle(key))) cypher b'|SAM\x11ZRP\x11S\x13XXFGXT\x12_U\\P\x1d' text2 = bytes(x^y for x,y in zip(cypher, cycle(key))) text2 b'Mary had a little lamb.' ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/19/2012 09:53 PM, Dave Angel wrote: On 07/19/2012 03:19 PM, Jordan wrote: SNIP OK. I am using one time pads to XOR data, but the one time pads (keys) are very large numbers, converting them to binary increases their size exponentially, which allows me to get more XORing done out of a single You want to explain this impossibility of increasing size exponentially? If you're wanting to waste memory, there are better ways. But it's only 8 times as big to save a string of 1's and zeros as to save the large-int they represent. And multiplying by 8 isn't an exponential function. Yes if you wish to dissect my words the wrong word was chosen... key. I am XORing both files and strings so I need to have code that can do both even if that means two branches of code via an if/else perhaps with an isinstance(data, str). I do not need to actually see the binary form. Then don't use the binary form. It doesn't make the computation any more powerful and it'll certainly slow it down. The title of the question is string to binary and back. Are you trying to match some other program's algorithm, and thus have strange constraints on your data? Or are you simply trying to make a secure way to encrypt binary files, using one-time pads? I already answered this question... A one-time pad is the same size as the message, so you simply need to convert the message into a large-int, and xor them. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
bin(integer).split('b')[1].zfill( multiple_of_eight ) OK so using this: Hopefully my copy paste works this time. bin_data = '' for char in data: bin_data += bin(ord(char)).split('b')[1].zfill(8) print(bin_data) bin_list = [bin_data[x:x + 2] for x in range(0, len(bin_data), 2)] print(bin_list) The paste looks good to me at this time. Not to me, but I can probably figure out enough based on this. How do I get back to the string? If I use this: data2 = [] for item in bin_list: data2.append(int(item, 2)) print(data2) The output is all too low of numbers for ord() to convert back to the correct string. Sure, this makes perfect sense to me :) (adding indent) for char in data: bin_data += bin(ord(char)).split('b')[1].zfill(8) bin_list = [bin_data[x:x + 2] for x in range(0, len(bin_data), 2)] Why are you grabbing 2 binary digits? The only possibilities are 0,1,2,3 and none are ASCII letters. You should be grabbing 8 at a time. bin_data = [ bin(ord(char)).split('b')[1].zfill(8) for char in data ] bin_string = ''.join(bin_data) bin_list = [ chr( int(char, 2) ) for char in bin_data ] I am not really sure what you are getting at with XOR and one time padding, but it has been a while since I have done any encryption. I would think you could do all this by just converting everything to int and then adding/replacing the pad in the list of ints. Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/19/2012 10:04 PM, eryksun wrote: On Thu, Jul 19, 2012 at 3:08 PM, Jordan wolfrage8...@gmail.com wrote: SNIP I'm not an expert with cryptography, but here's a simple XOR example: from itertools import cycle text = b'Mary had a little lamb.' key = b'1234' cypher = bytes(x^y for x,y in zip(text, cycle(key))) cypher b'|SAM\x11ZRP\x11S\x13XXFGXT\x12_U\\P\x1d' text2 = bytes(x^y for x,y in zip(cypher, cycle(key))) text2 b'Mary had a little lamb.' Hmm interesting, I am reading up on on itertools.cycle() and zip now. Thanks always more than one way to solve a problem. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
Sorry, I am not sure why Thunderbird is stripping the spaces, may have something to do with a plug-in that I have installed, I will have to look into it. On 07/19/2012 10:41 PM, Prasad, Ramit wrote: Sure, this makes perfect sense to me :) (adding indent) for char in data: bin_data += bin(ord(char)).split('b')[1].zfill(8) bin_list = [bin_data[x:x + 2] for x in range(0, len(bin_data), 2)] Why are you grabbing 2 binary digits? The only possibilities are 0,1,2,3 and none are ASCII letters. You should be grabbing 8 at a time. Right, sorry, first time working with binary and I was confused by a previous attempt. bin_data = [ bin(ord(char)).split('b')[1].zfill(8) for char in data ] bin_string = ''.join(bin_data) bin_list = [ chr( int(char, 2) ) for char in bin_data ] Thank you exactly what I was looking for! I am not really sure what you are getting at with XOR and one time padding, but it has been a while since I have done any encryption. And I have just started reading Applied Cryptography, so I am putting some of what I learn into practice. I would think you could do all this by just converting everything to int and then adding/replacing the pad in the list of ints. At first I was essentially doing just that, but when I first converted the large integers that are being used for the one time pad as the key to binary I saw how much larger it was, and then realized that was the bit length of the integer (technically Long). By doing that, I can get more out of the one time pad, but if you XOR binary against Ord, very few values will be changed because binary is only 1s and 0s as you know. To optimize the keys use, whether it wastes memory or not, I wanted to use binary on binary, this really comes into play with files, not so much the shorter strings. But since you bring if up, how would you convert a file to a list of ints? Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
SNIP OK. I am using one time pads to XOR data, but the one time pads (keys) are very large numbers, converting them to binary increases their size exponentially, which allows me to get more XORing done out of a single You want to explain this impossibility of increasing size exponentially? If you're wanting to waste memory, there are better ways. But it's only 8 times as big to save a string of 1's and zeros as to save the large-int they represent. And multiplying by 8 isn't an exponential function. Yes if you wish to dissect my words the wrong word was chosen... key. I am XORing both files and strings so I need to have code that can do both even if that means two branches of code via an if/else perhaps with an isinstance(data, str). I do not need to actually see the binary form. Then don't use the binary form. It doesn't make the computation any more powerful and it'll certainly slow it down. The title of the question is string to binary and back. Are you trying to match some other program's algorithm, and thus have strange constraints on your data? Or are you simply trying to make a secure way to encrypt binary files, using one-time pads? I already answered this question... Yes, you stated that it had to work on string and files, but are the files binary? DaveA and I are asking the questions because given what you are asking it just seems like you are not using the Right approach. I can touch my nose by touching my nose with my hand, or asking the person next to me to pick up my hand and use it to touch my nose. Both work, one is just faster and easier to understand. A one-time pad is the same size as the message, so you simply need to convert the message into a large-int, and xor them. Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/18/2012 05:07 PM, Jordan wrote: OK so I have been trying for a couple days now and I am throwing in the towel, Python 3 wins this one. I should have paid more attention to this the first time. Clearly you don't want help. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
bin_data = [ bin(ord(char)).split('b')[1].zfill(8) for char in data ] bin_string = ''.join(bin_data) bin_list = [ chr( int(char, 2) ) for char in bin_data ] Thank you exactly what I was looking for! I am not really sure what you are getting at with XOR and one time padding, but it has been a while since I have done any encryption. And I have just started reading Applied Cryptography, so I am putting some of what I learn into practice. I would think you could do all this by just converting everything to int and then adding/replacing the pad in the list of ints. At first I was essentially doing just that, but when I first converted the large integers that are being used for the one time pad as the key to binary I saw how much larger it was, and then realized that was the bit length of the integer (technically Long). By doing that, I can get more out of the one time pad, but if you XOR binary against Ord, very few values will be changed because binary is only 1s and 0s as you know. To optimize the keys use, whether it wastes memory or not, I wanted to use binary on binary, this really comes into play with files, not so much the shorter strings. How are you XOR-ing binary against something else? At a low level the data is pretty similar so that they should be mostly interchangeable. It is when you start abstracting the data that you have to convert between abstractions. Hold on, let me try a different angle. int, binary, and hex version of a number (lets say 65) are all just different representations of the same number. The only thing that changes is the base. 65 in octal (base 10) is 65 65 in hex (base 16) is 41 65 in binary (base 2 ) is 101 But they are ALL the same number. int( '65', 10 ) 65 int( '41', 16 ) 65 int( '101', 2 ) 65 But since you bring if up, how would you convert a file to a list of ints? with open(filename, 'r' ) as f: ints = [ ord( char ) for line in f for char in line ] Now all you need to do is modify the list to include your padding. but when I first converted the large integers that are being used for the one time pad as the key to binary I saw how much larger it was, and then realized that was the bit length of the integer (technically Long). By doing that, I can get more out of the one time pad, Large integers? Are you adding the integers for some reason? Extended ASCII only has ordinal values less than 256. Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/19/2012 10:48 PM, Prasad, Ramit wrote: SNIP OK. I am using one time pads to XOR data, but the one time pads (keys) are very large numbers, converting them to binary increases their size exponentially, which allows me to get more XORing done out of a single You want to explain this impossibility of increasing size exponentially? If you're wanting to waste memory, there are better ways. But it's only 8 times as big to save a string of 1's and zeros as to save the large-int they represent. And multiplying by 8 isn't an exponential function. Yes if you wish to dissect my words the wrong word was chosen... key. I am XORing both files and strings so I need to have code that can do both even if that means two branches of code via an if/else perhaps with an isinstance(data, str). I do not need to actually see the binary form. Then don't use the binary form. It doesn't make the computation any more powerful and it'll certainly slow it down. The title of the question is string to binary and back. Are you trying to match some other program's algorithm, and thus have strange constraints on your data? Or are you simply trying to make a secure way to encrypt binary files, using one-time pads? I already answered this question... Yes, you stated that it had to work on string and files, but are the files binary? DaveA and I are asking the questions because given what you are asking it just seems like you are not using the Right approach. I can touch my nose by touching my nose with my hand, or asking the person next to me to pick up my hand and use it to touch my nose. Both work, one is just faster and easier to understand. I am not sure how to answer that question because all files are binary, but the files that I will parse have an encoding that allows them to be read in a non-binary output. But my program will not use the in a non-binary way, that is why I plan to open them with the 'b' mode to open them as binary with no encoding assumed by python. I just not have tested this new technique that you gave me on a binary file yet as I was still implementing it for strings. I may not be using the right appraoch that is why I am asking. I also understand why the questions are needed, so you can understand my intent, so that you can better help me. But since DaveA and I had a misunderstanding over the missing indentation, for which I apologized and explained that my email editor is stripping the spaces, he seems to be badgering me. You want to explain this impossibility of increasing size exponentially? If you're wanting to waste memory, there are better ways. Now I would like to make it clear I very much so appreciate the help! So again, Thank you. A one-time pad is the same size as the message, so you simply need to convert the message into a large-int, and xor them. Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
I am not sure how to answer that question because all files are binary, but the files that I will parse have an encoding that allows them to be read in a non-binary output. But my program will not use the in a non-binary way, that is why I plan to open them with the 'b' mode to open them as binary with no encoding assumed by python. I just not have tested this new technique that you gave me on a binary file yet as I was still implementing it for strings. As far as I know, even in binary mode, python will convert the binary data to read and write strings. So there is no reason this technique would not work for binary. Note, I was able to use the string representation of a PDF file to write another PDF file. So you do not need to worry about the conversion of binary to strings. All you need to do is convert the string to int, encrypt, decrypt, convert back to string, and write out again. Note Python3 being Unicode might change things a bit. Not sure if you will need to convert to bytes or some_string.decode('ascii'). Now if you end up needing to handle non-ASCII data, then this exercise gets more complicated. Not sure if a simple way to convert all characters to a numerical point, but it should still be possible. If your data is binary, then I do not think you will run into any issues. Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] check against multiple variables
Selby Rowley-Cannon wrote: I am using a hash table in a small randomization program. I know that some hash functions can be prone to collisions, so I need a way to detect collisions. I doubt that very much. This entire question seems like a remarkable case of premature optimization. Start with demonstrating that collisions are an actual problem that need fixing. Unless you have profiled your application and proven that hash collisions is a real problem -- and unless you are hashing thousands of float NANs, that is almost certainly not the case -- you are just wasting your time and making your code slower rather than faster -- a pessimation, not optimization. And if it *is* a problem, then the solution is to fix your data so that its __hash__ method is less likely to collide. If you are rolling your own hash method, instead of using one of Python's, that's your first problem. Python's hash implementation is one of the most finely tuned in the world. Many, many years of effort have gone into making it stand up to real-world data. You aren't going to beat it with some half-planned pure-Python work-around. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On 07/19/2012 05:55 PM, Prasad, Ramit wrote: I am not sure how to answer that question because all files are binary, but the files that I will parse have an encoding that allows them to be read in a non-binary output. But my program will not use the in a non-binary way, that is why I plan to open them with the 'b' mode to open them as binary with no encoding assumed by python. I just not have tested this new technique that you gave me on a binary file yet as I was still implementing it for strings. As far as I know, even in binary mode, python will convert the binary data to read and write strings. So there is no reason this technique would not work for binary. Note, I was able to use the string representation of a PDF file to write another PDF file. So you do not need to worry about the conversion of binary to strings. All you need to do is convert the string to int, encrypt, decrypt, convert back to string, and write out again. Note Python3 being Unicode might change things a bit. Not sure if you will need to convert to bytes or some_string.decode('ascii'). In Python 3, if you open the file with b (as Jordan has said), it creates a bytes object. No use of strings needed or wanted. And no assumptions of ascii, except for the output of the % operator on a hex conversion. myfile = open(filename, b) data = myfile.read(size) At that point, convert it to hex with: hexdata = binascii.hexlify(data) then convert that to an integer: numdata = int(hexdata, 16) At that point, it's ready to xor with the one-time key, which had better be the appropriate size to match the data length. newhexdata = bytes(%x % numdata, ascii) newdata = binascii.unhexlify(newhexdata) If the file is bigger than the key, you have to get a new key. If the keys are chosen with a range of 2**200, then you'd read and convert the file 25 bytes at a time. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Calling a function does not return what I want it to return
I have this little program that is supposed to calculate how many diagonals a polygon of x sides has, but it does not return what I have in the return part of the function when I call it. Here is the code: def num_diag(var): ans = 0 if var = 3: print(No diagonals.) else: for i in range(num_sides - 3): ans = ans + i return (((var - 3)*2) + ans) num_sides = (int(raw_input(Enter sides: ))) num_diag(num_sides) Any suggestions as to what is going on? When I run it, it prompts me for the number of sides, and that's it. Thanks. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] re 33.116
I found ~200k files in /var/log all but 227 look like: list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.1.gz list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.1.gz.1.gz list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.1.gz.1.gz.1.gz list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.1.gz.2.gz list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.2.gz list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.2.gz.1.gz list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.3.gz list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.3.gz list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.3.gz.1.gz in both the day and night variant. I erased them all as / was at 100% used. It's now at 89% with ~400Mb free. Emile ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] re 33.116
On 7/19/2012 4:10 PM Emile van Sebille said... I found ~200k files in /var/log all but 227 look like: Sorry -- my bad. Emile ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Calling a function does not return what I want it to return
I have this little program that is supposed to calculate how many diagonals a polygon of x sides has, but it does not return what I have in the return part of the function when I call it. Here is the code: def num_diag(var): ans = 0 if var = 3: print(No diagonals.) else: for i in range(num_sides - 3): ans = ans + i return (((var - 3)*2) + ans) num_sides = (int(raw_input(Enter sides: ))) num_diag(num_sides) num_diag(5) NameError: global name 'num_sides' is not defined `for i in range(num_sides - 3):` Change num_sides to var. Ramit This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Calling a function does not return what I want it to return
On 7/19/2012 3:58 PM Alexander Q. said... I have this little program that is supposed to calculate how many diagonals a polygon of x sides has, but it does not return what I have in the return part of the function when I call it. Here is the code: def num_diag(var): ans = 0 if var = 3: print(No diagonals.) else: for i in range(num_sides - 3): ans = ans + i return (((var - 3)*2) + ans) num_sides = (int(raw_input(Enter sides: ))) You're almost there. Change the following num_diag(num_sides) to print num_diag(num_sides) (for pythons v3) or print (num_diag(num_sides)) (for python v3 ) Then see where that takes you. Emile Any suggestions as to what is going on? When I run it, it prompts me for the number of sides, and that's it. Thanks. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Calling a function does not return what I want it to return
On 07/19/2012 06:58 PM, Alexander Q. wrote: I have this little program that is supposed to calculate how many diagonals a polygon of x sides has, but it does not return what I have in the return part of the function when I call it. Here is the code: def num_diag(var): ans = 0 if var = 3: print(No diagonals.) else: for i in range(num_sides - 3): ans = ans + i return (((var - 3)*2) + ans) num_sides = (int(raw_input(Enter sides: ))) num_diag(num_sides) Any suggestions as to what is going on? When I run it, it prompts me for the number of sides, and that's it. Thanks. You never use the return value. Try assigning it, and printing it. result = num_diag(num_sides) print(final answer=, result) -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Creating a dictionary on user filter
Hi All, I have a few lists that I'm trying to put into a dictionary based on which list the user wants to use as a filter. If the user selects 1 the the dictionary would be created using the first list as the keys and the secondary items as the values. If the user selects 2, the dictionary would be created with the second list as the keys, and the remaining as the values. I think using dict(zip(firstList, (secondList, thirdList))) is the way to go but I'm having trouble with the placement of the items. What I have is this: firstList = ['a', 'b', 'c'] secondList = [1,2,3] thirdList = [1.20, 1.23, 2.54] What I am looking for is something like this for output: {'a': [1, 1.20], 'b': [2, 1.23], 'c': [3, 2.54]} What I'm now thinking is that I need to loop over each item in the list and update the dictionary such as: for x in range(a): compilation = dict(zip(a[x], (b[x], c[x]))) Any help is appreciated. -- ~MEN ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Calling a function does not return what I want it to return
On 20/07/12 00:17, Prasad, Ramit wrote: def num_diag(var): ans = 0 if var = 3: print(No diagonals.) else: for i in range(num_sides - 3): ans = ans + i return (((var - 3)*2) + ans) num_sides = (int(raw_input(Enter sides: ))) num_diag(num_sides) NameError: global name 'num_sides' is not defined `for i in range(num_sides - 3):` Change num_sides to var. It should work without, because it will pick up the global variable definition. It's probably not working the way it was intended to, but it should work... But changing it to use the argument would definitely be better. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Calling a function does not return what I want it to return
On Thu, Jul 19, 2012 at 4:21 PM, Dave Angel d...@davea.name wrote: On 07/19/2012 06:58 PM, Alexander Q. wrote: I have this little program that is supposed to calculate how many diagonals a polygon of x sides has, but it does not return what I have in the return part of the function when I call it. Here is the code: def num_diag(var): ans = 0 if var = 3: print(No diagonals.) else: for i in range(num_sides - 3): ans = ans + i return (((var - 3)*2) + ans) num_sides = (int(raw_input(Enter sides: ))) num_diag(num_sides) Any suggestions as to what is going on? When I run it, it prompts me for the number of sides, and that's it. Thanks. You never use the return value. Try assigning it, and printing it. result = num_diag(num_sides) print(final answer=, result) -- DaveA That did it- thanks Dave! -Alex ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Invalid Token Problem
Hi folks, I've been trying to convert numbers from digits to words, I wrote the following code; units = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine'] teens = ['eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen'] tens = ['ten', 'twenty', 'thirty', 'fourty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety'] def num2word(num): wordlist = [] if len(str(num)) == 4: wordlist = [units[1] + 'thousand'] if len(str(num)) == 3: if num%100 == 0: wordlist = [units[eval(str(num)[-3])-1] + 'hundred'] else: wordlist = [units[eval(str(num)[-3])-1],'hundred', 'and', num2word(eval(str(num)[-2:]))] if len(str(num)) == 2: if num%10 == 0: wordlist = [tens[eval(str(num)[-2])-1]] elif 10eval(str(num))20: wordlist = [teens[eval(str(num)[-1])-1]] else: wordlist = [tens[eval(str(num)[-2])-1], units[eval(str(num)[-1])-1]] if len(str(num)) == 1: wordlist = [units[num-1]] return ' '.join(wordlist) for i in range(1, 200): print i, num2word(i) but when I let it run till i = 108, it gives me an invalid token error as follows; ... 99 ninety nine 100 onehundred 101 one hundred and one 102 one hundred and two 103 one hundred and three 104 one hundred and four 105 one hundred and five 106 one hundred and six 107 one hundred and seven 108 Traceback (most recent call last): File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE University/Python/Python Code/MyCode/Project Euler code/Project Euler answer 17.py, line 33, in module print i, num2word(i) File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE University/Python/Python Code/MyCode/Project Euler code/Project Euler answer 17.py, line 18, in num2word wordlist = [units[eval(str(num)[-3])-1],'hundred', 'and', num2word(eval(str(num)[-2:]))] File string, line 1 08 ^ SyntaxError: invalid token I am at a loss, please help. gratefully, Abasiemeka ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string to binary and back... Python 3
On Thu, Jul 19, 2012 at 5:32 PM, Jordan wolfrage8...@gmail.com wrote: I am not sure how to answer that question because all files are binary, but the files that I will parse have an encoding that allows them to be read in a non-binary output. But my program will not use the in a non-binary way, that is why I plan to open them with the 'b' mode to open them as binary with no encoding assumed by python. I just not have tested this new technique that you gave me on a binary file yet as I was still implementing it for strings. Reading from a file in binary mode returns a bytes object in Python 3. Since iterating over bytes returns ints, you can cycle the key over the plain text using zip and compute the XOR without having to convert the entire message into a single big number in memory. Here's my example from before, adapted for files: from itertools import cycle key = b'1234' kit = cycle(key) with open('temp.txt', 'rb') as f, open('cipher.txt', 'wb') as fo: ... fit = iter(lambda: f.read(512), b'') ... for text in fit: ... fo.write(bytes(x^y for x,y in zip(text, kit))) Since the input file could be arbitrarily large and lack newlines, I'm using iter to create a special iterator that reads 512-byte chunks. The iterator stops when read returns an empty bytes object (i.e. b''). You could use a while loop instead. I assume here that the key is possibly shorter than the message (e.g. encrypting 1 megabyte of text with a 128 byte key). If you're making a one-time pad I think the key is the same length as the message. In that case you wouldn't have to worry about cycling it. Anyway, I'm not particularly interested in cryptography. I'm just trying to help with the operations. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Invalid Token Problem
On 07/19/2012 08:36 PM, Osemeka Osuagwu wrote: snip... 99 ninety nine 100 onehundred 101 one hundred and one 102 one hundred and two 103 one hundred and three 104 one hundred and four 105 one hundred and five 106 one hundred and six 107 one hundred and seven 108 Traceback (most recent call last): File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE University/Python/Python Code/MyCode/Project Euler code/Project Euler answer 17.py, line 33, in module print i, num2word(i) File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE University/Python/Python Code/MyCode/Project Euler code/Project Euler answer 17.py, line 18, in num2word wordlist = [units[eval(str(num)[-3])-1],'hundred', 'and', num2word(eval(str(num)[-2:]))] File string, line 1 08 ^ SyntaxError: invalid token 08 isn't a valid literal. Remove the leading zero. That says that the following digits are to be interpreted as octal, and 8 isn't a valid octal digit. Much better would be to eliminate the unnecessary use of eval(). It's dangerous, and sometimes doesn't do what you expect. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Invalid Token Problem
On 20/07/12 10:45, Dave Angel wrote: On 07/19/2012 08:36 PM, Osemeka Osuagwu wrote: snip... 99 ninety nine 100 onehundred 101 one hundred and one 102 one hundred and two 103 one hundred and three 104 one hundred and four 105 one hundred and five 106 one hundred and six 107 one hundred and seven 108 Traceback (most recent call last): File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE University/Python/Python Code/MyCode/Project Euler code/Project Euler answer 17.py, line 33, in module print i, num2word(i) File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE University/Python/Python Code/MyCode/Project Euler code/Project Euler answer 17.py, line 18, in num2word wordlist = [units[eval(str(num)[-3])-1],'hundred', 'and', num2word(eval(str(num)[-2:]))] File string, line 1 08 ^ SyntaxError: invalid token 08 isn't a valid literal. Remove the leading zero. That says that the following digits are to be interpreted as octal, and 8 isn't a valid octal digit. Try to think of another way to convert an integer string into an integer value. hINT() Much better would be to eliminate the unnecessary use of eval(). It's dangerous, and sometimes doesn't do what you expect. More specifically, eval() is dangerous if you try to evaluate a string supplied by someone else. You really can't predict what will happen. However, if you use eval() on strings that you create yourself, it can be a handy technique. When you are starting out, it's best to ignore eval() until later. Ross ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Join email list
___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] string to binary and back... Python 3
On Fri, Jul 20, 2012 at 12:33 AM, Dave Angel d...@davea.name wrote: On 07/19/2012 05:55 PM, Prasad, Ramit wrote: I am not sure how to answer that question because all files are binary, but the files that I will parse have an encoding that allows them to be read in a non-binary output. But my program will not use the in a non-binary way, that is why I plan to open them with the 'b' mode to open them as binary with no encoding assumed by python. I just not have tested this new technique that you gave me on a binary file yet as I was still implementing it for strings. As far as I know, even in binary mode, python will convert the binary data to read and write strings. So there is no reason this technique would not work for binary. Note, I was able to use the string representation of a PDF file to write another PDF file. So you do not need to worry about the conversion of binary to strings. All you need to do is convert the string to int, encrypt, decrypt, convert back to string, and write out again. Note Python3 being Unicode might change things a bit. Not sure if you will need to convert to bytes or some_string.decode('ascii'). In Python 3, if you open the file with b (as Jordan has said), it creates a bytes object. No use of strings needed or wanted. And no assumptions of ascii, except for the output of the % operator on a hex conversion. myfile = open(filename, b) data = myfile.read(size) At that point, convert it to hex with: hexdata = binascii.hexlify(data) then convert that to an integer: numdata = int(hexdata, 16) At that point, it's ready to xor with the one-time key, which had better be the appropriate size to match the data length. newhexdata = bytes(%x % numdata, ascii) newdata = binascii.unhexlify(newhexdata) If the file is bigger than the key, you have to get a new key. If the keys are chosen with a range of 2**200, then you'd read and convert the file 25 bytes at a time. Thanks I will give this a try. Can you explian a little further for me what exactly this: newhexdata = bytes(%x % numdata, ascii) line is doing? I don't quite understand the use of the %x % on numdata. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor