Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-19 Thread Steven D'Aprano
On Wed, Jul 18, 2012 at 04:33:20PM -0700, Ryan Waples wrote:

 I've included 20 consecutive lines of input and output.  Each of these
 5 'records' should have been selected and printed to the output file.

I count only 19 lines. The first group has only three lines. See below.

There is a blank line, which I take as NOT part of the input but just a 
spacer. Then:

1) Line starting with @
2) Line of bases CGCGT ...
3) Plus sign
4) Line starting with @@@
5) Line starting with @
6) Line of bases TTCTA ...
7) Plus sign

and so on. There are TWO lines before the first +, and three before each 
of the others.



 __EXAMPLE RAW DATA FILE REGION__
 
 @HWI-ST0747:167:B02DEACXX:8:1101:3182:167088 1:N:0:
 CGCGTGTGCAGGTTTATAGAACCAGCTGCAGATTAGTAGCAGCGCACGGAGAGGTGTGTCTGTTTATTGTCCTCAGCAGGCAGACATGTTTGTGGTC
 +
 @@@DDADDHB9+2A??:?G9+C)???G@DB@@DGFB0*?FF?0F:@/54'-;;?B;;6(5@CDAC(5(5:5,(8?88?BC@#
 @HWI-ST0747:167:B02DEACXX:8:1101:3134:167090 1:N:0:
 TTCTAGTGCAGGGCGACAGCGTTGCGGAGCCGGTCCGAGTCTGCTGGGTCAGTCATGGCTAGTTGGTACTATAACGACACAGGGCGAGACCCAGATGCAAA
 +
 @CCFFFDFHJJIJHHIIIJHGHIJI@GFFDDDFDDCEEEDCCBDCCCCCB@C(4@ADCA?BBBDDABB055-?AB1:@ACC:
 @HWI-ST0747:167:B02DEACXX:8:1101:3002:167092 1:N:0:
 CTTTGCTGCAGGCTCATCCTGACATGACCCTCCAGCATGACAATGCCACCAGCCATACTGCTCGTTCTGTGTGTGATTTCCAGCAAGTAAATATGTA
 +
 CCCFHIJIEHIH@AHFAGHIGIIGGEIJGIJIIIGIIIGEHGEHIIJIEHH@FHGH@=ACEHHFBFFCE@AACCACDB;;B?C3AADBA
 @HWI-ST0747:167:B02DEACXX:8:1101:3022:167094 1:N:0:
 ATTCCGTGCAGGCCAACTCCCGACGGACATCCTTGCTCAGACTGCAGCGATAGTGGTCGATCAGGGCCCTGTTGTTCCATCCCACTCCGGCGACCAGGTTC
 +
 CCCFHIDHJIIHIIIJIJIIGGIIFHJIIIIEIFHFFCBAECBDDDC:??B=AAACD?8@:C@?8CBDDD@D99B@3884A
 @HWI-ST0747:167:B02DEACXX:8:1101:3095:167100 1:N:0:
 CGTGATTGCAGGGACGTTACAGAGACGTTACAGGGATGTTACAGGGACGTTACAGAGACGTTAAAGAGATGTTACAGGGATGTTACAGACAGAGACGTTAC
 +

Your code says that the first line in each group should start with an @ 
sign. That is clearly not the case for the last two groups.

I suggest that your data files have been corrupted.

 __PYTHON CODE __

I have re-written your code slightly, to be a little closer to best 
practice, or at least modern practice. If there is anything you don't 
understand, please feel free to ask.

I haven't tested this code, but it should run fine on Python 2.7.

It will be interesting to see if you get different results with this.



import glob

def four_lines(file_object):
Yield lines from file_object grouped into batches of four.

If the file has fewer than four lines remaining, pad the batch 
with 1-3 empty strings.

Lines are stripped of leading and trailing whitespace.

while True:
# Get the first line. If there is no first line, we are at EOF
# and we raise StopIteration to indicate we are done.
line1 = next(file_object).strip()
# Get the next three lines, padding if needed.
line2 = next(file_object, '').strip()
line3 = next(file_object, '').strip()
line4 = next(file_object, '').strip()
yield (line1, line2, line3, line4)


my_in_files = glob.glob ('E:/PINK/Paired_End/raw/gzip/*.fastq')
for each in my_in_files:
out = each.replace('/gzip', '/rem_clusters2' )
print (Reading File:  + each)
print (Writing File:  + out)
INFILE = open (each, 'r')
OUTFILE = open (out , 'w')
writes = 0

for reads, lines in four_lines( INFILE ):
ID_Line_1, Seq_Line, ID_Line_2, Quality_Line = lines
# Check that ID_Line_1 starts with @
if not ID_Line_1.startswith('@'):
print (**ERROR**)
print (expected ID_Line to start with @)
print (lines)
print (Read Number  + str(Reads))
break
elif Quality_Line != '+':
print (**ERROR**)
print (expected Quality_Line = +)
print (lines)
print (Read Number  + str(Reads))
break
# Select Reads that I want to keep  
ID = ID_Line_1.partition(' ')
if (ID[2] == 1:N:0: or ID[2] == 2:N:0:):
# Write to file, maintaining group of 4
OUTFILE.write(ID_Line_1 + \n)
OUTFILE.write(Seq_Line + \n)
OUTFILE.write(ID_Line_2 + \n)
OUTFILE.write(Quality_Line + \n)
writes += 1
# End of file reached, print update
print (Saw, reads, groups of four lines)
print (Wrote, writes, groups of four lines)
INFILE.close()
OUTFILE.close()





-- 
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:

Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Mark Lawrence

On 19/07/2012 06:41, wolfrage8...@gmail.com wrote:

On Thu, Jul 19, 2012 at 12:16 AM, Dave Angel d...@davea.name wrote:


  On 07/18/2012 05:07 PM, Jordan wrote:

OK so I have been trying for a couple days now and I am throwing in the
towel, Python 3 wins this one.
I want to convert a string to binary and back again like in this
question: Stack Overflow: Convert Binary to ASCII and vice versa
(Python)


http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-vice-versa-python


But in Python 3 I consistently get  some sort of error relating to the
fact that nothing but bytes and bytearrays support the buffer interface
or I get an overflow error because something is too large to be
converted to bytes.
Please help me and then explian what I am not getting that is new in
Python 3. I would like to point out I realize that binary, hex, and
encodings are all a very complex subject and so I do not expect to
master it but I do hope that I can gain a deeper insight. Thank you all.

test_script.py:
import binascii

test_int = 109

test_int = int(str(test_int) + '45670')
data = 'Testing XOR Again!'

while sys.getsizeof(data)  test_int.bit_length():

test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), 'big')))

print('Bit Length: ' + str(test_int.bit_length()))

key = test_int # Yes I know this is an unnecessary step...

data = bin(int(binascii.hexlify(bytes(data, 'UTF-8')), 16))

print(data)

data = int(data, 2)

print(data)

data = binascii.unhexlify('%x' % data)



I don't get the same error you did.  I get:

  File jordan.py, line 13
 test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1),
'big')))
^


test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), \
 'big')))
# That was probably just do to the copy and paste.


IndentationError: expected an indented block


Please post it again, with correct indentation.  if you used tabs, then
expand them to spaces before pasting it into your test-mode mail editor.

I only use spaces and this program did not require any indentation until

it was pasted and the one line above became split across two line. Really
though that was a trivial error to correct.


Really?  Are you using a forked version of Python that doesn't need 
indentation after a while loop, or are you speaking with a forked 
tongue? :)  Strangely I believe the latter, so please take note of what 
Dave Angel has told you and post with the correct indentation.






I'd also recommend you remove a lot of the irrelevant details there.  if
you have a problem with hexlfy and/or unhexlify, then give a simple byte
string that doesn't work for you, and somebody can probably identify why
not.  And if you want people to run your code, include the imports as well.

My problem is not specific to hexlify and unhexlify, my problem is trying

to convert from string to binary and back. That is why all of the details,
to show I have tried on my own.
Sorry that I forgot to include sys and os for imports.



As it is, you're apparently looping, comparing the byte memory size of a
string (which is typically 4 bytes per character) with the number of
significant bits in an unrelated number.

I suspect what you want is something resembling (untested):

 mybytes = bytes( %x % data, ascii)
 newdata = binascii.unexlify(mybytes)

I was comparing them but I think I understand how to compare them well,

now I want to convert them both to binary so that I can XOR them together.
Thank you for your time and help Dave, now I need to reply to Ramit.



--
DaveA





___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor




--
Cheers.

Mark Lawrence.



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-19 Thread Ryan Waples

 If you copy those files to a different device (one that has just been 
 scrubbed and reformatted), then copy them back and get different results 
 with your application, you've found your problem.

 -Bill

 Thanks for the insistence,  I'll check this out.  If you have any
 guidance on how to do so let me know.  I knew my system wasn't
 particularly well suited to the task at hand, but I haven't seen how
 it would actually cause problems.

 -Ryan
 ___
 The last two lines in my MSG pretty much would be the test. Get another 
 flash drive, format it as FAT-32 (I assume that's what you are using), then 
 copy a couple of files to it.  Then copy them back to your current device 
 and run your program again. If you get DIFFERENT, but still wrong results, 
 you've found the problem. The largest positive integer a 32-bit binary 
 number can represent is 2^32, which is 4Gig.  I'm no expert on Window's 
 files, but I'd be very surprised if when the FAT-32 file system was being 
 designed, anyone considered the case where a single file could be that large.

 -Bill


The hard-drive is formatted as NTFS, because as you say I'm up against
the file size limit of FAT32 , do think this could still be the issue?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-19 Thread Ryan Waples
 I count only 19 lines.

yep, you are right.  My bad, I think I missing copy/pasting line 20.

The first group has only three lines. See below.

Not so, the first group is actually the first four lines listed below.
 Lines 1-4 serve as one group.  For what it is worth, line four should
have 1 character for each char in line 1, and the first line is much
shorter, contains a space, and for this file always ends in either
1:N:0: (keep) 1Y0: (remove).   The EXAMPLE data is correctly
formatted as it should be, but I'm missing line 20.

 There is a blank line, which I take as NOT part of the input but just a
 spacer. Then:

 1) Line starting with @
 2) Line of bases CGCGT ...
 3) Plus sign
 4) Line starting with @@@
 5) Line starting with @
 6) Line of bases TTCTA ...
 7) Plus sign

 and so on. There are TWO lines before the first +, and three before each
 of the others.

I think you are just reading one frame shifted, its not a well
designed format because the required start character @, can appear
other places as well




 __EXAMPLE RAW DATA FILE REGION__

 @HWI-ST0747:167:B02DEACXX:8:1101:3182:167088 1:N:0:
 CGCGTGTGCAGGTTTATAGAACCAGCTGCAGATTAGTAGCAGCGCACGGAGAGGTGTGTCTGTTTATTGTCCTCAGCAGGCAGACATGTTTGTGGTC
 +
 @@@DDADDHB9+2A??:?G9+C)???G@DB@@DGFB0*?FF?0F:@/54'-;;?B;;6(5@CDAC(5(5:5,(8?88?BC@#
 @HWI-ST0747:167:B02DEACXX:8:1101:3134:167090 1:N:0:
 TTCTAGTGCAGGGCGACAGCGTTGCGGAGCCGGTCCGAGTCTGCTGGGTCAGTCATGGCTAGTTGGTACTATAACGACACAGGGCGAGACCCAGATGCAAA
 +
 @CCFFFDFHJJIJHHIIIJHGHIJI@GFFDDDFDDCEEEDCCBDCCCCCB@C(4@ADCA?BBBDDABB055-?AB1:@ACC:
 @HWI-ST0747:167:B02DEACXX:8:1101:3002:167092 1:N:0:
 CTTTGCTGCAGGCTCATCCTGACATGACCCTCCAGCATGACAATGCCACCAGCCATACTGCTCGTTCTGTGTGTGATTTCCAGCAAGTAAATATGTA
 +
 CCCFHIJIEHIH@AHFAGHIGIIGGEIJGIJIIIGIIIGEHGEHIIJIEHH@FHGH@=ACEHHFBFFCE@AACCACDB;;B?C3AADBA
 @HWI-ST0747:167:B02DEACXX:8:1101:3022:167094 1:N:0:
 ATTCCGTGCAGGCCAACTCCCGACGGACATCCTTGCTCAGACTGCAGCGATAGTGGTCGATCAGGGCCCTGTTGTTCCATCCCACTCCGGCGACCAGGTTC
 +
 CCCFHIDHJIIHIIIJIJIIGGIIFHJIIIIEIFHFFCBAECBDDDC:??B=AAACD?8@:C@?8CBDDD@D99B@3884A
 @HWI-ST0747:167:B02DEACXX:8:1101:3095:167100 1:N:0:
 CGTGATTGCAGGGACGTTACAGAGACGTTACAGGGATGTTACAGGGACGTTACAGAGACGTTAAAGAGATGTTACAGGGATGTTACAGACAGAGACGTTAC
 +



 Your code says that the first line in each group should start with an @
 sign. That is clearly not the case for the last two groups.

 I suggest that your data files have been corrupted.

I'm pretty sure that my raw IN files are all good, its hard to be sure
with such a large file, but the very picky downstream analysis program
takes every single raw file just fine (30 of them), and gaks on my
filtered files, at regions that don't conform to the correct
formatting.


 __PYTHON CODE __

 I have re-written your code slightly, to be a little closer to best
 practice, or at least modern practice. If there is anything you don't
 understand, please feel free to ask.

 I haven't tested this code, but it should run fine on Python 2.7.

 It will be interesting to see if you get different results with this.

--CODE REMOVED--

Thanks, for the suggestions.  I've never really felt super comfortable
using objects at all, but its what I want to learn next.  This will be
helpful, and useful.

  for reads, lines in four_lines( INFILE ):
ID_Line_1, Seq_Line, ID_Line_2, Quality_Line = lines

Can you explain what is going on here, or point me In the right
direction?  I see that the parts of 'lines' get assigned, but I'm
missing how the file gets iterated over and how reads gets
incremented.

Do you have a reason why this approach might give a 'better' output?

Thanks again.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-19 Thread Alan Gauld

On 19/07/12 07:00, Steven D'Aprano wrote:


def four_lines(file_object):

snipping

 line1 = next(file_object).strip()
 # Get the next three lines, padding if needed.
 line2 = next(file_object, '').strip()
 line3 = next(file_object, '').strip()
 line4 = next(file_object, '').strip()
 yield (line1, line2, line3, line4)

snipping...


 for reads, lines in four_lines( INFILE ):
 ID_Line_1, Seq_Line, ID_Line_2, Quality_Line = lines


Shouldn't that be

  for reads, lines in enumerate( four_lines(INFILE) ):
  ID_Line_1, Seq_Line, ID_Line_2, Quality_Line = lines

?

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] suggestion for an editor

2012-07-19 Thread Bala subramanian
Friends,
At present i write programs using vi editor. I am interested to change to
something else. My specific need is that i want to select a portion/small
segment of my program (for eg. a nested loop) and then monitor processing
time it takes for that portion while i run the program. By this i hope to
find the segment that takes time and modify to achieve better speed. Can
someone please share their experience.

Thanks,
Bala

-- 
C. Balasubramanian
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] suggestion for an editor

2012-07-19 Thread Ranjith Kumar
Try Sublime,

On Thu, Jul 19, 2012 at 1:39 PM, Bala subramanian bala.biophys...@gmail.com
 wrote:

 Friends,
 At present i write programs using vi editor. I am interested to change to
 something else. My specific need is that i want to select a portion/small
 segment of my program (for eg. a nested loop) and then monitor processing
 time it takes for that portion while i run the program. By this i hope to
 find the segment that takes time and modify to achieve better speed. Can
 someone please share their experience.

 Thanks,
 Bala

 --
 C. Balasubramanian


 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor




-- 
Cheers,
Ranjith Kumar K,
Chennai.

http://ranjithtenz.wordpress.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Dave Angel
On 07/19/2012 01:41 AM, wolfrage8...@gmail.com wrote:
 On Thu, Jul 19, 2012 at 12:16 AM, Dave Angel d...@davea.name wrote:

 SNIP
 I don't get the same error you did.  I get:

  File jordan.py, line 13
 test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1),
 'big')))
^

 test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), \
 'big')))
 # That was probably just do to the copy and paste.

That was just the first line that was not indented.  If I thought you
had a one-line while loop, I certainly would have just indented it.  But
I'm sure you have some unknown number of additional lines that were
indented in your original.  Please post in text form.

 SNIP
 I'd also recommend you remove a lot of the irrelevant details there.  if
 you have a problem with hexlfy and/or unhexlify, then give a simple byte
 string that doesn't work for you, and somebody can probably identify why
 not.  And if you want people to run your code, include the imports as well.

 My problem is not specific to hexlify and unhexlify, my problem is trying
 to convert from string to binary and back. That is why all of the details,
 to show I have tried on my own.
 Sorry that I forgot to include sys and os for imports.

Lots of details that have nothing to do with it.  For example, that
whole thing about adding random digits together.  You could replace the
whole thing with a simple assignment of a value that doesn't work for you.


 As it is, you're apparently looping, comparing the byte memory size of a
 string (which is typically 4 bytes per character) with the number of
 significant bits in an unrelated number.

 I suspect what you want is something resembling (untested):

 mybytes = bytes( %x % data, ascii)
 newdata = binascii.unexlify(mybytes)

 I was comparing them but I think I understand how to compare them well,
 now I want to convert them both to binary so that I can XOR them together.
 Thank you for your time and help Dave, now I need to reply to Ramit.

Ah, so you don't actually want binary at all!!!   Why not state the real
problem up front?  You can XOR two integers, without bothering to
convert to a string of ones and zeroes.  Use the carat operator.

print( 40 ^ 12)

I suspect there's an equivalent for strings or byte-strings.  But if
not, it's a simple loop.



-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Wayne Werner

I'll preface my response by saying that I know/understand fairly little about
it, but since I've recently been smacked by this same issue when converting
stuff to Python3, I'll see if I can explain it in a way that makes sense.

On Wed, 18 Jul 2012, Jordan wrote:


OK so I have been trying for a couple days now and I am throwing in the
towel, Python 3 wins this one.
I want to convert a string to binary and back again like in this
question: Stack Overflow: Convert Binary to ASCII and vice versa
(Python)
http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-vice-versa-python
But in Python 3 I consistently get  some sort of error relating to the
fact that nothing but bytes and bytearrays support the buffer interface
or I get an overflow error because something is too large to be
converted to bytes.
Please help me and then explian what I am not getting that is new in
Python 3. I would like to point out I realize that binary, hex, and
encodings are all a very complex subject and so I do not expect to
master it but I do hope that I can gain a deeper insight. Thank you all.


The way I've read it - stop thinking about strings as if they are text. The
biggest reason that all this has changed is because Python has grown up and
entered the world where Unicode actually matters. To us poor shmucks in the
English speaking countries of the world it's all very confusing becaust it's
nothing we have to deal with. 26 letters is perfectly fine for us - and if we
want uppercase we'll just throw another 26. Add a few dozen puncuation marks
and 256 is a perfectly fine amount of characters.

To make a slightly relevant side trip, when you were a kid did you ever send
secret messages to a friend with a code like this?

A = 1
B = 2
.
.
.
Z = 26

Well, that's basically what is going on when it comes to bytes/text/whatever.
When you input some text, Python3 believes that whatever you wrote was encoded
with Unicode. The nice thing for us 26-letter folks is that the ASCII alphabet
we're so used to just so happens to map quite well to Unicode encodings - so
'A' in ASCII is the same number as 'A' in utf-8.

Now, here's the part that I had to (and still need to) wrap my mind around - if
the string is just bytes then it doesn't really matter what the string is
supposed to represent. It could represent the LATIN-1 character set. Or
UTF-8, -16, or some other weird encoding. And all the operations that are
supposed to modify these strings of bytes (e.g. removing spaces, splitting on a
certain character, etc.) still work. Because if I have this string:

9 45 12 9 13 19 18 9 12 99 102

and I tell you to split on the 9's, it doesn't matter if that's some weird
ASCII character, or some equally weird UTF character, or something else
entirely. And I don't have to worry about things getting munged up when I try
to stick Unicode and ASCII values together - because they're converted to bytes
first.

So the question is, of course, if it's all bytes, then why does it look like
text when I print it out? Well, that's because Python converts that byte stream
to Unicode text when it's printed. Or ASCII, if you tell it to.

But Python3 has converted all(?) of those functions that used to operate on
text and made them operate on byte streams instead. Except for the ones that
operate on text ;)



Well, I hope that's of some use and isn't too much of a lie - like I said, I'm
still trying to wrap my head around things and I've found that explaining (or
trying to explain) to someone else is often the best way to work out the idea
in your own head. If I've gone too far astray I'm sure the other helpful folks
here will correct me :)

HTH,
Wayne
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-19 Thread Wayne Werner

Just a few notes...

On Wed, 18 Jul 2012, Ryan Waples wrote:
snip


import glob

my_in_files = glob.glob ('E:/PINK/Paired_End/raw/gzip/*.fastq')

for each in my_in_files:
#print(each)
out = each.replace('/gzip', '/rem_clusters2' )
#print (out)
INFILE = open (each, 'r')
OUTFILE = open (out , 'w')



It's slightly confusing to see your comments left-aligned instead of with the
code they refer to. At first glance it looked as though your block ended here,
when it does, in fact, continue.


# Tracking Variables
Reads = 0
Writes = 0
Check_For_End_Of_File = 0

#Updates
print (Reading File:  + each)
print (Writing File:  + out)

# Read FASTQ File by group of four lines
while Check_For_End_Of_File == 0:


This is Python, not C - checking for EOF is probably silly (unless you're
really checking for end of data) - you can just do:

for line in INFILE:
ID_Line_1 = line
Seq_line = next(INFILE) # Replace with INFILE.next() for Python2
ID_Line_2 = next(INFILE)
Quality_Line = next(INFILE)



# Read the next four lines from the FASTQ file
ID_Line_1   = INFILE.readline()
Seq_Line= INFILE.readline()
ID_Line_2   = INFILE.readline()
Quality_Line= INFILE.readline()

# Strip off leading and trailing whitespace characters
ID_Line_1   = ID_Line_1.strip()
Seq_Line= Seq_Line.strip()
ID_Line_2   = ID_Line_2.strip()
Quality_Line= Quality_Line.strip()



Also, it's just extra clutter to call strip like this when you can just tack it
on to your original statement:

for line in INFILE:
ID_Line_1 = line.strip()
Seq_line = next(INFILE).strip() # Replace with INFILE.next() for Python2
ID_Line_2 = next(INFILE).strip()
Quality_Line = next(INFILE).strip()


Reads = Reads + 1

#Check that I have not reached the end of file
if Quality_Line == :
#End of file reached, print update
print (Saw  + str(Reads) +  reads)
print (Wrote  + str(Writes) +  reads)
Check_For_End_Of_File = 1
break


This break is superfluous - it will actually remove you from the while loop -
no further lines of code will be evaluated, including the original `while`
comparison. You can also just test the Quality_Line for truthiness directly,
since empty string evaluate to false. I would actually just say:

if Quality_Line:
#Do the rest of your stuff here



#Check that ID_Line_1 starts with @
if not ID_Line_1.startswith('@'):
print (**ERROR**)
print (each)
print (Read Number  + str(Reads))
print ID_Line_1 + ' does not start with @'
break #ends the while loop

# Select Reads that I want to keep
ID = ID_Line_1.partition(' ')
if (ID[2] == 1:N:0: or ID[2] == 2:N:0:):
# Write to file, maintaining group of 4
OUTFILE.write(ID_Line_1 + \n)
OUTFILE.write(Seq_Line + \n)
OUTFILE.write(ID_Line_2 + \n)
OUTFILE.write(Quality_Line + \n)
Writes = Writes +1


INFILE.close()
OUTFILE.close()


You could (as long as you're on 2.6 or greater) just use the `with` block for
reading the files then you don't need to worry about closing - the block takes
care of that, even on errors:

for each in my_in_files:
out = each.replace('/gzip', '/rem_clusters2' )
with open (each, 'r') as INFILE, open (out, 'w') as OUTFILE:
for line in INFILE:
# Do your work here...


A few stylistic points:
ALL_CAPS are usually reserved for constants - infile and outfile are perfectly
legitimate names.

Caps_In_Variable_Names are usually discouraged. Class names should be CamelCase
(e.g. SimpleHTTPServer), while variable names should be lowercase with
underscores if needed, so id_line_1 instead of ID_Line_1.

If you're using Python3 or from __future__ import print_function, rather than
doing OUTFILE.write(value + '\n') you can do:

print(value, file=OUTFILE)

Then you get the \n for free. You could also just do:

print(val1, val2, val3, sep='\n', end='\n', file=OUTFILE)

The end parameter is there for example only, since the default value for end is
'\n'


HTH,
Wayne
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread eryksun
On Thu, Jul 19, 2012 at 1:41 AM, wolfrage8...@gmail.com
wolfrage8...@gmail.com wrote:

 I was comparing them but I think I understand how to compare them well, now
 I want to convert them both to binary so that I can XOR them together. Thank
 you for your time and help Dave, now I need to reply to Ramit.

A bytes object is a container of 8-bit numbers (i.e. range 0 to 255).
If you index it, you'll get an int that supports the XOR operation:

 b1 = b'a'
 b2 = b'b'
 b1[0]
97
 b2[0]
98
 bin(b1[0])
'0b111'
 bin(b2[0])
'0b1100010'
 bin(b1[0] ^ b2[0])
'0b11'

You can use the int method  from_bytes to XOR two bitstrings stored
as Python bytes:

 b3 = b''
 b4 = b''
 bin(int.from_bytes(b3, 'big') ^ int.from_bytes(b4, 'big'))
'0b11001100110011'

The computation is done between int objects, not strings. Creating a
string using bin is just for presentation.

P.S.:

Instead of bin you can use the format command to have more
control, such as for zero padding. The integer format code b is for
a binary representation. Preceding it by a number starting with zero
will pad with zeros to the given number of characters (e.g. 032 will
prepend zeros to make the result at least 32 characters long):

 r = int.from_bytes(b3, 'big') ^ int.from_bytes(b4, 'big')
 format(r, 032b)
'0011001100110011'

Instead of hard coding the length (e.g. 032), you can use the length
of the input bitstrings to calculate the size of the result:

 size = 8 * max(len(b3), len(b4))
 format(r, 0%db % size)
'0011001100110011'
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] suggestion for an editor

2012-07-19 Thread Wayne Werner

On Thu, 19 Jul 2012, Bala subramanian wrote:


Friends,
At present i write programs using vi editor. I am interested to change to 
something else. My specific need is that i want to select a portion/small 
segment of my
program (for eg. a nested loop) and then monitor processing time it takes for 
that portion while i run the program. By this i hope to find the segment that 
takes
time and modify to achieve better speed. Can someone please share their 
experience.


I'm not sure how vi has anything to do with the speed of your program(!)

For performance measurements you should look into the Timeit module. How 
long does it take your program to run currently? After all, premature 
optimisation is the root of all evil...



-Wayne
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Pragmatic Unicode, or, How do I stop the pain?

2012-07-19 Thread Sander Sweers
https://www.youtube.com/watch?v=sgHbC6udIqc

This is a very good talk on Unicode which was done at PyCon US 2012.
It helped me a lot to understand the pain.

Greets
Sander
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Flatten a list in tuples and remove doubles

2012-07-19 Thread PyProg PyProg
Hi all,

I would get a new list as:

[(0, '3eA', 'Dupont', 'Juliette', '11.0/10.0', '4.0/5.0', '17.5/30.0',
'3.0/5.0', '4.5/10.0', '35.5/60.0'), (1, '3eA', 'Pop', 'Iggy',
'12.0/10.0', '3.5/5.0', '11.5/30.0', '4.0/5.0', '5.5/10.0',
'7.5/10.0', '40.5/60.0')]

... from this one:

[(0, '3eA', 'Dupont', 'Juliette', 0, 11.0, 10.0), (0, '3eA', 'Dupont',
'Juliette', 1, 4.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 2, 17.5,
30.0), (0, '3eA', 'Dupont', 'Juliette', 3, 3.0, 5.0), (0, '3eA',
'Dupont', 'Juliette', 4, 4.5, 10.0), (0, '3eA', 'Dupont', 'Juliette',
5, 35.5, 60.0), (1, '3eA', 'Pop', 'Iggy', 0, 12.0, 10.0), (1, '3eA',
'Pop', 'Iggy', 1, 3.5, 5.0), (1, '3eA', 'Pop', 'Iggy', 2, 11.5, 30.0),
(1, '3eA', 'Pop', 'Iggy', 3, 4.0, 5.0), (1, '3eA', 'Pop', 'Iggy', 4,
5.5, 10.0), (1, '3eA', 'Pop', 'Iggy', 5, 40.5, 60.0)]

How to make that ? I'm looking for but for now I can't do it.

Thanks in advance.

a+

-- 
http://ekd.tuxfamily.org
http://ekdm.wordpress.com
http://glouk.legtux.org/guiescputil
http://lcs.dunois.clg14.ac-caen.fr/~alama/blog
http://lprod.org/wiki/doku.php/video:encodage:avchd_converter
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Flatten a list in tuples and remove doubles

2012-07-19 Thread PyProg PyProg
Oh I forgot to mention, with Python 2 (2.7).

-- 
http://ekd.tuxfamily.org
http://ekdm.wordpress.com
http://glouk.legtux.org/guiescputil
http://lcs.dunois.clg14.ac-caen.fr/~alama/blog
http://lprod.org/wiki/doku.php/video:encodage:avchd_converter
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] suggestion for an editor

2012-07-19 Thread Alan Gauld

On 19/07/12 09:09, Bala subramanian wrote:

Friends,
At present i write programs using vi editor. I am interested to change
to something else. My specific need is that i want to select a
portion/small segment of my program (for eg. a nested loop) and then
monitor processing time it takes for that portion while i run the
program.


I suspect its not a new editor you need but the profile module...
Take a look at its documentation.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Flatten a list in tuples and remove doubles

2012-07-19 Thread शंतनू
You may use 'set'.

e.g.

===
 x
[(1, 2, 3), (1, 1), (2, 2), (1, 1), (2, 2)]
 set(x)
set([(2, 2), (1, 1), (1, 2, 3)])
===

On 19-Jul-2012, at 11:03 PM, PyProg PyProg wrote:

 Hi all,
 
 I would get a new list as:
 
 [(0, '3eA', 'Dupont', 'Juliette', '11.0/10.0', '4.0/5.0', '17.5/30.0',
 '3.0/5.0', '4.5/10.0', '35.5/60.0'), (1, '3eA', 'Pop', 'Iggy',
 '12.0/10.0', '3.5/5.0', '11.5/30.0', '4.0/5.0', '5.5/10.0',
 '7.5/10.0', '40.5/60.0')]
 
 ... from this one:
 
 [(0, '3eA', 'Dupont', 'Juliette', 0, 11.0, 10.0), (0, '3eA', 'Dupont',
 'Juliette', 1, 4.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 2, 17.5,
 30.0), (0, '3eA', 'Dupont', 'Juliette', 3, 3.0, 5.0), (0, '3eA',
 'Dupont', 'Juliette', 4, 4.5, 10.0), (0, '3eA', 'Dupont', 'Juliette',
 5, 35.5, 60.0), (1, '3eA', 'Pop', 'Iggy', 0, 12.0, 10.0), (1, '3eA',
 'Pop', 'Iggy', 1, 3.5, 5.0), (1, '3eA', 'Pop', 'Iggy', 2, 11.5, 30.0),
 (1, '3eA', 'Pop', 'Iggy', 3, 4.0, 5.0), (1, '3eA', 'Pop', 'Iggy', 4,
 5.5, 10.0), (1, '3eA', 'Pop', 'Iggy', 5, 40.5, 60.0)]
 
 How to make that ? I'm looking for but for now I can't do it.
 
 Thanks in advance.
 
 a+
 
 -- 
 http://ekd.tuxfamily.org
 http://ekdm.wordpress.com
 http://glouk.legtux.org/guiescputil
 http://lcs.dunois.clg14.ac-caen.fr/~alama/blog
 http://lprod.org/wiki/doku.php/video:encodage:avchd_converter
 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Flatten a list in tuples and remove doubles

2012-07-19 Thread Prasad, Ramit
 I would get a new list as:
 
 [(0, '3eA', 'Dupont', 'Juliette', '11.0/10.0', '4.0/5.0', '17.5/30.0',
 '3.0/5.0', '4.5/10.0', '35.5/60.0'), (1, '3eA', 'Pop', 'Iggy',
 '12.0/10.0', '3.5/5.0', '11.5/30.0', '4.0/5.0', '5.5/10.0',
 '7.5/10.0', '40.5/60.0')]
 
 ... from this one:
 
 [(0, '3eA', 'Dupont', 'Juliette', 0, 11.0, 10.0), (0, '3eA', 'Dupont',
 'Juliette', 1, 4.0, 5.0), (0, '3eA', 'Dupont', 'Juliette', 2, 17.5,
 30.0), (0, '3eA', 'Dupont', 'Juliette', 3, 3.0, 5.0), (0, '3eA',
 'Dupont', 'Juliette', 4, 4.5, 10.0), (0, '3eA', 'Dupont', 'Juliette',
 5, 35.5, 60.0), (1, '3eA', 'Pop', 'Iggy', 0, 12.0, 10.0), (1, '3eA',
 'Pop', 'Iggy', 1, 3.5, 5.0), (1, '3eA', 'Pop', 'Iggy', 2, 11.5, 30.0),
 (1, '3eA', 'Pop', 'Iggy', 3, 4.0, 5.0), (1, '3eA', 'Pop', 'Iggy', 4,
 5.5, 10.0), (1, '3eA', 'Pop', 'Iggy', 5, 40.5, 60.0)]
 
 How to make that ? I'm looking for but for now I can't do it.


Well first thing to do would be to describe the logic behind what
you are doing. Without knowing that it is difficult to come up
with the correct solution. I am guessing you want a list
where fields 1,2,3 (based on the first element being field 0) 
of field a string of `field 5 + '/' + field 6`. But that does
not tell me what field 0 should be in the new format.

This is a pretty crude sample that should work for you.

lookup = {}

for row in old_list:
key = (row[1],row[2],row[3])
field_0, ratios = lookup.setdefault( key, (row[0], []) )
ratios.append( '{0}/{1}'.format( row[5], row[6] ) )
new_list = []
for key, value in lookup.items():
row = [ value[0], key[0], key[1], key[2] ]
row.extend(value[1])
new_list.append(tuple(row))

# Might need to sort to get the exact same results


Ramit

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Jordan


On 07/19/2012 12:15 AM, Prasad, Ramit wrote:
SNIP
 I think your basic problem is too much conversion because you do not
 understand the types. A string is represented by a series of bytes
 which are binary numbers. Do you understand the concept behind ASCII?
 Each letter has a numeric representation that are sequential. So the
 string 'abcd' is equivalent to a series of bytes 65,66,67,68. It is
 not equivalent to 65666768 or 65+66+67+68. So your first task is to
 convert each character to the numeric equivalent and store them in a
 list. Once you have them converted to a list of integers, you can
 create another list that is a list of characters.
Sorry for the long delay in getting back to you, I got called to the field.
Thank you, I agree I do feel like I am doing too much conversion. I do
understand the concept behind ASCII at least enough to know about ord()
although I did for get about chr() which is ord()'s reverse function. I
had tried to break them down to the ordinal value, but I really do want
to get the integer and the data down to binary, as it provides an
advantage for the overall program that I am writing. Thank you for your
time.

 Look at the functions chr and ord here
 ( http://docs.python.org/py3k/library/functions.html )

 Ramit


 Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
 712 Main Street | Houston, TX 77002
 work phone: 713 - 216 - 5423

 --

 This email is confidential and subject to important disclaimers and
 conditions including on offers for the purchase or sale of
 securities, accuracy and completeness of information, viruses,
 confidentiality, legal privilege, and legal entity disclaimers,
 available at http://www.jpmorgan.com/pages/disclosures/email.  
 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Jordan


On 07/19/2012 08:14 AM, Mark Lawrence wrote:
 On 19/07/2012 06:41, wolfrage8...@gmail.com wrote:
 On Thu, Jul 19, 2012 at 12:16 AM, Dave Angel d...@davea.name wrote:


SNIP
 Really?  Are you using a forked version of Python that doesn't need
 indentation after a while loop, or are you speaking with a forked
 tongue? :)  Strangely I believe the latter, so please take note of
 what Dave Angel has told you and post with the correct indentation.

http://www.101emailetiquettetips.com/
Number 101 is for you. Good day.


 I'd also recommend you remove a lot of the irrelevant details
 there.  if
 you have a problem with hexlfy and/or unhexlify, then give a simple
 byte
 string that doesn't work for you, and somebody can probably identify
 why
 not.  And if you want people to run your code, include the imports
 as well.

 My problem is not specific to hexlify and unhexlify, my problem is
 trying
 to convert from string to binary and back. That is why all of the
 details,
 to show I have tried on my own.
 Sorry that I forgot to include sys and os for imports.


 As it is, you're apparently looping, comparing the byte memory size
 of a
 string (which is typically 4 bytes per character) with the number of
 significant bits in an unrelated number.

 I suspect what you want is something resembling (untested):

  mybytes = bytes( %x % data, ascii)
  newdata = binascii.unexlify(mybytes)

 I was comparing them but I think I understand how to compare them well,
 now I want to convert them both to binary so that I can XOR them
 together.
 Thank you for your time and help Dave, now I need to reply to Ramit.


 -- 
 DaveA




 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Fwd: string to binary and back... Python 3

2012-07-19 Thread Jordan


On 07/19/2012 10:29 AM, Walter Prins wrote:
 Hi,

 Just to show you your original message contained no indentation
 whatsoever.  You might want to check your mail client settings and do
 some experiments to make sure that indentation spaces are let through
 unmolested and not stripped anywhere, otherwise the current little
 brouhaha about formatting will result.  You have to admit, it's not
 easy to read the code below with zero indentation present... :)
Thank you for pointing that out, I did not realize it as I had copied
and pasted it from the python file I was working on. I guess Thunderbird
edited the email on me, even though I had put it into plain text mode.
Next time perhaps I will just attach the file if that is acceptable
rather than getting attacked for what my mail editor did.

 Regards

 Walter


 -- Forwarded message --
 From: Jordan wolfrage8...@gmail.com
 Date: 18 July 2012 22:07
 Subject: [Tutor] string to binary and back... Python 3
 To: tutor@python.org


 OK so I have been trying for a couple days now and I am throwing in the
 towel, Python 3 wins this one.
 I want to convert a string to binary and back again like in this
 question: Stack Overflow: Convert Binary to ASCII and vice versa
 (Python)
 http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-vice-versa-python
 But in Python 3 I consistently get  some sort of error relating to the
 fact that nothing but bytes and bytearrays support the buffer interface
 or I get an overflow error because something is too large to be
 converted to bytes.
 Please help me and then explian what I am not getting that is new in
 Python 3. I would like to point out I realize that binary, hex, and
 encodings are all a very complex subject and so I do not expect to
 master it but I do hope that I can gain a deeper insight. Thank you all.

 test_script.py:
 import binascii

 test_int = 109

 test_int = int(str(test_int) + '45670')
 data = 'Testing XOR Again!'

 while sys.getsizeof(data)  test_int.bit_length():

 test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), 'big')))

 print('Bit Length: ' + str(test_int.bit_length()))

 key = test_int # Yes I know this is an unnecessary step...

 data = bin(int(binascii.hexlify(bytes(data, 'UTF-8')), 16))

 print(data)

 data = int(data, 2)

 print(data)

 data = binascii.unhexlify('%x' % data)


 wolfrage@lm12-laptop02 ~/Projects $ python3 test_script.py
 Bit Length: 134
 0b1010100011001010111001101110100011010010110111001100111001001011100010100100010010101100111011101101001011011100011
 7351954002991226380810260999848996570230305
 Traceback (most recent call last):
 File test_script.py, line 24, in module
 data = binascii.unhexlify('%x' % data)
 TypeError: 'str' does not support the buffer interface



 test_script2.py:
 import binascii
 test_int = 109
 test_int = int(str(test_int) + '45670')
 data = 'Testing XOR Again!'
 while sys.getsizeof(data)  test_int.bit_length():
 test_int = int(str(test_int) + str(int.from_bytes(os.urandom(1), 'big')))
 print('Bit Length: ' + str(test_int.bit_length()))
 key = test_int # Yes I know this is an unnecessary step...
 data = bin(int(binascii.hexlify(bytes(data, 'UTF-8')), 16))
 print(data)
 data = int(data, 2)
 print(data)
 data = binascii.unhexlify(bytes(data, 'utf8'))



 wolfrage@lm12-laptop02 ~/Projects $ python3 test_script2.py
 Bit Length: 140
 0b1010100011001010111001101110100011010010110111001100111001001011100010100100010010101100111011101101001011011100011
 7351954002991226380810260999848996570230305
 Traceback (most recent call last):
 File test_script.py, line 24, in module
 data = binascii.unhexlify(bytes(data, 'utf8'))
 OverflowError: cannot fit 'int' into an index-sized integer

 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Fwd: string to binary and back... Python 3

2012-07-19 Thread Prasad, Ramit
  Just to show you your original message contained no indentation
  whatsoever.  You might want to check your mail client settings and do
  some experiments to make sure that indentation spaces are let through
  unmolested and not stripped anywhere, otherwise the current little
  brouhaha about formatting will result.  You have to admit, it's not
  easy to read the code below with zero indentation present... :)

 Thank you for pointing that out, I did not realize it as I had copied
 and pasted it from the python file I was working on. I guess Thunderbird
 edited the email on me, even though I had put it into plain text mode.
 Next time perhaps I will just attach the file if that is acceptable
 rather than getting attacked for what my mail editor did.

A fair amount of the list does not get attachments as this is a 
gateway to newsgroups. Copy/paste works for short code fragments 
if you are sure you are posting in plain text. Otherwise, you
can post to services like pastebin and link to that.

Ramit
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Jordan


On 07/19/2012 12:46 PM, Dave Angel wrote:
 On 07/19/2012 01:41 AM, wolfrage8...@gmail.com wrote:
 On Thu, Jul 19, 2012 at 12:16 AM, Dave Angel d...@davea.name wrote:
SNIP
 That was just the first line that was not indented.  If I thought you
 had a one-line while loop, I certainly would have just indented it.  But
 I'm sure you have some unknown number of additional lines that were
 indented in your original.  Please post in text form.
I see now, I am sorry I did not know that Thunderbird had eliminated all
of my indentation. I had set it to plain text, but I guess it is not to
be trusted. I simply looked at the reply email and saw the line that you
pointed out and at that time figured Thunderbird had just word wrapped
that line on me. Would it be acceptable to add an attached Python file?
SNIP
 Lots of details that have nothing to do with it.  For example, that
 whole thing about adding random digits together.  You could replace the
 whole thing with a simple assignment of a value that doesn't work for you.
OK I will, that was a test script and I was testing multiple things, I
did try to get rid of most of the cruft but I will attempt to do better
in the future. SNIP
 now I want to convert them both to binary so that I can XOR them together.
 Thank you for your time and help Dave, now I need to reply to Ramit.
 Ah, so you don't actually want binary at all!!!   Why not state the real
 problem up front?  You can XOR two integers, without bothering to
 convert to a string of ones and zeroes.  Use the carat operator.

 print( 40 ^ 12)

 I suspect there's an equivalent for strings or byte-strings.  But if
 not, it's a simple loop.
Actually I do want binary, as it serves as an advantage for the overall
program that I am building.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Prasad, Ramit
  I think your basic problem is too much conversion because you do not
  understand the types. A string is represented by a series of bytes
  which are binary numbers. Do you understand the concept behind ASCII?
  Each letter has a numeric representation that are sequential. So the
  string 'abcd' is equivalent to a series of bytes 65,66,67,68. It is
  not equivalent to 65666768 or 65+66+67+68. So your first task is to
  convert each character to the numeric equivalent and store them in a
  list. Once you have them converted to a list of integers, you can
  create another list that is a list of characters.

 Sorry for the long delay in getting back to you, I got called to the field.
 Thank you, I agree I do feel like I am doing too much conversion. I do
 understand the concept behind ASCII at least enough to know about ord()
 although I did for get about chr() which is ord()'s reverse function. I
 had tried to break them down to the ordinal value, but I really do want
 to get the integer and the data down to binary, as it provides an
 advantage for the overall program that I am writing. Thank you for your
 time.

Why not explain your usecase? Technically, everything is binary
on a computer so the question is why do *you* need to see the
binary form? Anyway, you can get the binary string by doing
`bin(ord(character))` and reverse it by doing 
`chr(int(binary_string,2))`. [1]

If you are doing some kind of XOR (I think your first email mentioned 
it) then you can XOR integers. Unless you are doing some kind of 
display of binary output, you usually do not need to actually see 
the binary string as most binary manipulation can be done via the 
integer value.

 [1] - Thanks to Steven.




Ramit

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Jordan
My response is down lower, thank you Wayne.

On 07/19/2012 12:52 PM, Wayne Werner wrote:
 I'll preface my response by saying that I know/understand fairly
 little about
 it, but since I've recently been smacked by this same issue when
 converting
 stuff to Python3, I'll see if I can explain it in a way that makes sense.

 On Wed, 18 Jul 2012, Jordan wrote:

 OK so I have been trying for a couple days now and I am throwing in the
 towel, Python 3 wins this one.
 I want to convert a string to binary and back again like in this
 question: Stack Overflow: Convert Binary to ASCII and vice versa
 (Python)
 http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-vice-versa-python

 But in Python 3 I consistently get  some sort of error relating to the
 fact that nothing but bytes and bytearrays support the buffer interface
 or I get an overflow error because something is too large to be
 converted to bytes.
 Please help me and then explian what I am not getting that is new in
 Python 3. I would like to point out I realize that binary, hex, and
 encodings are all a very complex subject and so I do not expect to
 master it but I do hope that I can gain a deeper insight. Thank you all.

 The way I've read it - stop thinking about strings as if they are
 text. The
 biggest reason that all this has changed is because Python has grown
 up and
 entered the world where Unicode actually matters. To us poor shmucks
 in the
 English speaking countries of the world it's all very confusing
 becaust it's
 nothing we have to deal with. 26 letters is perfectly fine for us -
 and if we
 want uppercase we'll just throw another 26. Add a few dozen puncuation
 marks
 and 256 is a perfectly fine amount of characters.

 To make a slightly relevant side trip, when you were a kid did you
 ever send
 secret messages to a friend with a code like this?

 A = 1
 B = 2
 .
 .
 .
 Z = 26

 Well, that's basically what is going on when it comes to
 bytes/text/whatever.
 When you input some text, Python3 believes that whatever you wrote was
 encoded
 with Unicode. The nice thing for us 26-letter folks is that the ASCII
 alphabet
 we're so used to just so happens to map quite well to Unicode
 encodings - so
 'A' in ASCII is the same number as 'A' in utf-8.

 Now, here's the part that I had to (and still need to) wrap my mind
 around - if
 the string is just bytes then it doesn't really matter what the
 string is
 supposed to represent. It could represent the LATIN-1 character set. Or
 UTF-8, -16, or some other weird encoding. And all the operations that are
 supposed to modify these strings of bytes (e.g. removing spaces,
 splitting on a
 certain character, etc.) still work. Because if I have this string:

 9 45 12 9 13 19 18 9 12 99 102

 and I tell you to split on the 9's, it doesn't matter if that's some
 weird
 ASCII character, or some equally weird UTF character, or something else
 entirely. And I don't have to worry about things getting munged up
 when I try
 to stick Unicode and ASCII values together - because they're converted
 to bytes
 first.

 So the question is, of course, if it's all bytes, then why does it
 look like
 text when I print it out? Well, that's because Python converts that
 byte stream
 to Unicode text when it's printed. Or ASCII, if you tell it to.

 But Python3 has converted all(?) of those functions that used to
 operate on
 text and made them operate on byte streams instead. Except for the
 ones that
 operate on text ;)



 Well, I hope that's of some use and isn't too much of a lie - like I
 said, I'm
 still trying to wrap my head around things and I've found that
 explaining (or
 trying to explain) to someone else is often the best way to work out
 the idea
 in your own head. If I've gone too far astray I'm sure the other
 helpful folks
 here will correct me :)

Thank you for the vary informative post, every bit helps. It has
certainly been a challenge for me with the new everything is bytes
scheme, especially how everything has to be converted to bytes prior to
going on a buffer.
 HTH,
 Wayne
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Jordan
A question I have for the group before I respond is a option that I saw
that I had earlier was to ord() each element of a string and then bin()
that number. But since bin() produces a string I could not figure out
the correct way to attach two bin() outputs back together again due to
the leading 'b' and even if I use lstrip('b') I was not sure if that
would be correct?
My next hesitation is can the same or at least similar techniques be
applied to a file? I want to be able to work on both files and strings.

On 07/19/2012 01:22 PM, eryksun wrote:
 On Thu, Jul 19, 2012 at 1:41 AM, wolfrage8...@gmail.com
 wolfrage8...@gmail.com wrote:
 I was comparing them but I think I understand how to compare them well, now
 I want to convert them both to binary so that I can XOR them together. Thank
 you for your time and help Dave, now I need to reply to Ramit.
 A bytes object is a container of 8-bit numbers (i.e. range 0 to 255).
 If you index it, you'll get an int that supports the XOR operation:

 b1 = b'a'
 b2 = b'b'
 b1[0]
 97
 b2[0]
 98
 bin(b1[0])
 '0b111'
 bin(b2[0])
 '0b1100010'
 bin(b1[0] ^ b2[0])
 '0b11'

 You can use the int method  from_bytes to XOR two bitstrings stored
 as Python bytes:

 b3 = b''
 b4 = b''
 bin(int.from_bytes(b3, 'big') ^ int.from_bytes(b4, 'big'))
 '0b11001100110011'

 The computation is done between int objects, not strings. Creating a
 string using bin is just for presentation.

 P.S.:

 Instead of bin you can use the format command to have more
 control, such as for zero padding. The integer format code b is for
 a binary representation. Preceding it by a number starting with zero
 will pad with zeros to the given number of characters (e.g. 032 will
 prepend zeros to make the result at least 32 characters long):
The control sounds good and I may need that latter (To adjust things to
a fixed length), but for the purpose of XORing a message padding a key
with zeros would not be desirable if Eve was able to get her hands on
the source code.
 r = int.from_bytes(b3, 'big') ^ int.from_bytes(b4, 'big')
 format(r, 032b)
 '0011001100110011'

 Instead of hard coding the length (e.g. 032), you can use the length
 of the input bitstrings to calculate the size of the result:
That sounds good.
 size = 8 * max(len(b3), len(b4))
 format(r, 0%db % size)
 '0011001100110011'
Is this output the output for size rather than the two variables joined
together?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Jordan


On 07/19/2012 08:53 PM, Prasad, Ramit wrote:
 I think your basic problem is too much conversion because you do not
 understand the types. A string is represented by a series of bytes
 which are binary numbers. Do you understand the concept behind ASCII?
 Each letter has a numeric representation that are sequential. So the
 string 'abcd' is equivalent to a series of bytes 65,66,67,68. It is
 not equivalent to 65666768 or 65+66+67+68. So your first task is to
 convert each character to the numeric equivalent and store them in a
 list. Once you have them converted to a list of integers, you can
 create another list that is a list of characters.
 Sorry for the long delay in getting back to you, I got called to the field.
 Thank you, I agree I do feel like I am doing too much conversion. I do
 understand the concept behind ASCII at least enough to know about ord()
 although I did for get about chr() which is ord()'s reverse function. I
 had tried to break them down to the ordinal value, but I really do want
 to get the integer and the data down to binary, as it provides an
 advantage for the overall program that I am writing. Thank you for your
 time.
 Why not explain your usecase? Technically, everything is binary
 on a computer so the question is why do *you* need to see the
 binary form? Anyway, you can get the binary string by doing
 `bin(ord(character))` and reverse it by doing 
 `chr(int(binary_string,2))`. [1]
OK. I am using one time pads to XOR data, but the one time pads (keys)
are very large numbers, converting them to binary increases their size
exponentially, which allows me to get more XORing done out of a single
key. I am XORing both files and strings so I need to have code that can
do both even if that means two branches of code via an if/else perhaps
with an isinstance(data, str).
I do not need to actually see the binary form.
 If you are doing some kind of XOR (I think your first email mentioned 
 it) then you can XOR integers. Unless you are doing some kind of 
 display of binary output, you usually do not need to actually see 
 the binary string as most binary manipulation can be done via the 
 integer value.
Agreed. Although the visual does help for validation (seeing is believing).
  [1] - Thanks to Steven.
Yes thank you Steven. I am working on the code now to see if I can make
the above work for me, if I need further help I will be back.
Thank you all again for your time.




 Ramit

 This email is confidential and subject to important disclaimers and
 conditions including on offers for the purchase or sale of
 securities, accuracy and completeness of information, viruses,
 confidentiality, legal privilege, and legal entity disclaimers,
 available at http://www.jpmorgan.com/pages/disclosures/email.  
 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Prasad, Ramit
 A question I have for the group before I respond is a option that I saw
 that I had earlier was to ord() each element of a string and then bin()
 that number. But since bin() produces a string I could not figure out
 the correct way to attach two bin() outputs back together again due to
 the leading 'b' and even if I use lstrip('b') I was not sure if that
 would be correct?

bin(integer).split('b')[1].zfill( multiple_of_eight )

 My next hesitation is can the same or at least similar techniques be
 applied to a file? I want to be able to work on both files and strings.

Probably, but it depends on what you are trying to do and what
data you are dealing with.

Ramit
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] check against multiple variables

2012-07-19 Thread Selby Rowley-Cannon
I am using a hash table in a small randomization program. I know that 
some hash functions can be prone to collisions, so I need a way to 
detect collisions.
The 'hash value' will be stored as a variable. I do not want to check it 
against each singular hash value, as there will be many; I need a way to 
check it against all hash values at once (if possible.) Sorry for those 
who like to reference, but there is no source code as of yet. I will 
need this to be solved before I can start writing, sorry!


If you need any extra info let me know.

-Selby
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] check against multiple variables

2012-07-19 Thread Emile van Sebille

On 7/19/2012 12:29 PM Selby Rowley-Cannon said...

I am using a hash table in a small randomization program. I know that
some hash functions can be prone to collisions, so I need a way to
detect collisions.
The 'hash value' will be stored as a variable. I do not want to check it
against each singular hash value, as there will be many; I need a way to
check it against all hash values at once (if possible.)


so keeping the hash values in a dict would allow you to test as follows:

if new_hash_value in dict_of_hash_values:
# and bob's your uncle.

Emile



Sorry for those
who like to reference, but there is no source code as of yet. I will
need this to be solved before I can start writing, sorry!

If you need any extra info let me know.

 -Selby
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor





___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Dave Angel
On 07/19/2012 03:19 PM, Jordan wrote:
 SNIP

 OK. I am using one time pads to XOR data, but the one time pads (keys)
 are very large numbers, converting them to binary increases their size
 exponentially, which allows me to get more XORing done out of a single

You want to explain this impossibility of increasing size
exponentially?  If you're wanting to waste memory, there are better
ways.  But it's only 8 times as big to save a string of 1's and zeros as
to save the large-int they represent.  And multiplying by 8 isn't an
exponential function.

 key. I am XORing both files and strings so I need to have code that can
 do both even if that means two branches of code via an if/else perhaps
 with an isinstance(data, str).
 I do not need to actually see the binary form.


Then don't use the binary form.  It doesn't make the computation any
more powerful and it'll certainly slow it down.

Are you trying to match some other program's algorithm, and thus have
strange constraints on your data?  Or are you simply trying to make a
secure way to encrypt binary files, using one-time pads?

A one-time pad is the same size as the message, so you simply need to
convert the message into a large-int, and xor them.


-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Jordan


On 07/19/2012 09:23 PM, Prasad, Ramit wrote:
 A question I have for the group before I respond is a option that I saw
 that I had earlier was to ord() each element of a string and then bin()
 that number. But since bin() produces a string I could not figure out
 the correct way to attach two bin() outputs back together again due to
 the leading 'b' and even if I use lstrip('b') I was not sure if that
 would be correct?
 bin(integer).split('b')[1].zfill( multiple_of_eight )
OK so using this: Hopefully my copy paste works this time.

bin_data = ''

for char in data:

bin_data += bin(ord(char)).split('b')[1].zfill(8)

print(bin_data)

bin_list = [bin_data[x:x + 2] for x in range(0, len(bin_data), 2)]

print(bin_list)



The paste looks good to me at this time.
How do I get back to the string? If I use this:

data2 = []

for item in bin_list:

data2.append(int(item, 2))

print(data2)



The output is all too low of numbers for ord() to convert back to the
correct string.

 My next hesitation is can the same or at least similar techniques be
 applied to a file? I want to be able to work on both files and strings.
 Probably, but it depends on what you are trying to do and what
 data you are dealing with.
I just want to perform the same conversion on the file data, that is
down to binary and back to it's original state.
I was thinking I would just use the file in binary mode when I open it,
but I am not sure if that is true binary or if it is hex or something
else altogether. I think my confusion came from trying to do both files
and strings at the same time and failing back and forth.

 Ramit
 This email is confidential and subject to important disclaimers and
 conditions including on offers for the purchase or sale of
 securities, accuracy and completeness of information, viruses,
 confidentiality, legal privilege, and legal entity disclaimers,
 available at http://www.jpmorgan.com/pages/disclosures/email.  
 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread eryksun
On Thu, Jul 19, 2012 at 3:08 PM, Jordan wolfrage8...@gmail.com wrote:

 size = 8 * max(len(b3), len(b4))
 format(r, 0%db % size)
 '0011001100110011'
 Is this output the output for size rather than the two variables joined
 together?

Using format is useful if you need the string to be padded with
zeros for the most significant byte. I wouldn't think it's important
if you're just using the bitstring representation as a sanity check on
your algorithm. In that case you can more easily use bin.

That said, len(b3) is the number of characters (bytes) in the bytes
object. Since b3 and b4 could be different lengths in general, I took
the max length to use for the zero padding. In this case both b3 and
b4 contain 4 bytes, so size is 32.

 OK. I am using one time pads to XOR data, but the one time pads (keys)
 are very large numbers, converting them to binary increases their size
 exponentially, which allows me to get more XORing done out of a single
 key. I am XORing both files and strings so I need to have code that can
 do both even if that means two branches of code via an if/else perhaps
 with an isinstance(data, str).

I'm not an expert with cryptography, but here's a simple XOR example:

 from itertools import cycle
 text = b'Mary had a little lamb.'
 key = b'1234'
 cypher = bytes(x^y for x,y in zip(text, cycle(key)))
 cypher
b'|SAM\x11ZRP\x11S\x13XXFGXT\x12_U\\P\x1d'
 text2 = bytes(x^y for x,y in zip(cypher, cycle(key)))
 text2
b'Mary had a little lamb.'
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Jordan


On 07/19/2012 09:53 PM, Dave Angel wrote:
 On 07/19/2012 03:19 PM, Jordan wrote:
 SNIP

 OK. I am using one time pads to XOR data, but the one time pads (keys)
 are very large numbers, converting them to binary increases their size
 exponentially, which allows me to get more XORing done out of a single
 You want to explain this impossibility of increasing size
 exponentially?  If you're wanting to waste memory, there are better
 ways.  But it's only 8 times as big to save a string of 1's and zeros as
 to save the large-int they represent.  And multiplying by 8 isn't an
 exponential function.

Yes if you wish to dissect my words the wrong word was chosen...
 key. I am XORing both files and strings so I need to have code that can
 do both even if that means two branches of code via an if/else perhaps
 with an isinstance(data, str).
 I do not need to actually see the binary form.

 Then don't use the binary form.  It doesn't make the computation any
 more powerful and it'll certainly slow it down.
The title of the question is string to binary and back.

 Are you trying to match some other program's algorithm, and thus have
 strange constraints on your data?  Or are you simply trying to make a
 secure way to encrypt binary files, using one-time pads?
I already answered this question...

 A one-time pad is the same size as the message, so you simply need to
 convert the message into a large-int, and xor them.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Prasad, Ramit
  bin(integer).split('b')[1].zfill( multiple_of_eight )
 OK so using this: Hopefully my copy paste works this time.
 
 bin_data = ''
 
 for char in data:
 
 bin_data += bin(ord(char)).split('b')[1].zfill(8)
 
 print(bin_data)
 
 bin_list = [bin_data[x:x + 2] for x in range(0, len(bin_data), 2)]
 
 print(bin_list)
 
 
 
 The paste looks good to me at this time.

Not to me, but I can probably figure out enough based on this.

 How do I get back to the string? If I use this:
 
 data2 = []
 
 for item in bin_list:
 
 data2.append(int(item, 2))
 
 print(data2)
 
 
 
 The output is all too low of numbers for ord() to convert back to the
 correct string.
 

Sure, this makes perfect sense to me :) (adding indent)

for char in data:
bin_data += bin(ord(char)).split('b')[1].zfill(8)
bin_list = [bin_data[x:x + 2] for x in range(0, len(bin_data), 2)]

Why are you grabbing 2 binary digits? The only possibilities are 0,1,2,3
and none are ASCII letters. You should be grabbing 8 at a time.

bin_data = [ bin(ord(char)).split('b')[1].zfill(8) for char in data ]
bin_string = ''.join(bin_data)
bin_list = [ chr( int(char, 2) ) for char in bin_data ]

I am not really sure what you are getting at with XOR and one time
padding, but it has been a while since I have done any encryption.

I would think you could do all this by just converting everything
to int and then adding/replacing the pad in the list of ints.


Ramit

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Jordan


On 07/19/2012 10:04 PM, eryksun wrote:
 On Thu, Jul 19, 2012 at 3:08 PM, Jordan wolfrage8...@gmail.com wrote:
SNIP
 I'm not an expert with cryptography, but here's a simple XOR example:
 from itertools import cycle
 text = b'Mary had a little lamb.'
 key = b'1234'
 cypher = bytes(x^y for x,y in zip(text, cycle(key)))
 cypher
 b'|SAM\x11ZRP\x11S\x13XXFGXT\x12_U\\P\x1d'
 text2 = bytes(x^y for x,y in zip(cypher, cycle(key)))
 text2
 b'Mary had a little lamb.'
Hmm interesting, I am reading up on on itertools.cycle() and zip now.
Thanks always more than one way to solve a problem.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Jordan
Sorry, I am not sure why Thunderbird is stripping the spaces, may have
something to do with a plug-in that I have installed, I will have to
look into it.

On 07/19/2012 10:41 PM, Prasad, Ramit wrote:
 Sure, this makes perfect sense to me :) (adding indent)

 for char in data:
 bin_data += bin(ord(char)).split('b')[1].zfill(8)
 bin_list = [bin_data[x:x + 2] for x in range(0, len(bin_data), 2)]

 Why are you grabbing 2 binary digits? The only possibilities are 0,1,2,3
 and none are ASCII letters. You should be grabbing 8 at a time.
Right, sorry, first time working with binary and I was confused by a
previous attempt.

 bin_data = [ bin(ord(char)).split('b')[1].zfill(8) for char in data ]
 bin_string = ''.join(bin_data)
 bin_list = [ chr( int(char, 2) ) for char in bin_data ]
Thank you exactly what I was looking for!

 I am not really sure what you are getting at with XOR and one time
 padding, but it has been a while since I have done any encryption.
And I have just started reading Applied Cryptography, so I am putting
some of what I learn into practice.

 I would think you could do all this by just converting everything
 to int and then adding/replacing the pad in the list of ints.
At first I was essentially doing just that, but when I first converted
the large integers that are being used for the one time pad as the key
to binary I saw how much larger it was, and then realized that was the
bit length of the integer (technically Long). By doing that, I can get
more out of the one time pad, but if you XOR binary against Ord, very
few values will be changed because binary is only 1s and 0s as you know.
To optimize the keys use, whether it wastes memory or not, I wanted to
use binary on binary, this really comes into play with files, not so
much the shorter strings.
But since you bring if up, how would you convert a file to a list of ints?


 Ramit

 This email is confidential and subject to important disclaimers and
 conditions including on offers for the purchase or sale of
 securities, accuracy and completeness of information, viruses,
 confidentiality, legal privilege, and legal entity disclaimers,
 available at http://www.jpmorgan.com/pages/disclosures/email.  
 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Prasad, Ramit
  SNIP
 
  OK. I am using one time pads to XOR data, but the one time pads (keys)
  are very large numbers, converting them to binary increases their size
  exponentially, which allows me to get more XORing done out of a single
  You want to explain this impossibility of increasing size
  exponentially?  If you're wanting to waste memory, there are better
  ways.  But it's only 8 times as big to save a string of 1's and zeros as
  to save the large-int they represent.  And multiplying by 8 isn't an
  exponential function.
 
 Yes if you wish to dissect my words the wrong word was chosen...
  key. I am XORing both files and strings so I need to have code that can
  do both even if that means two branches of code via an if/else perhaps
  with an isinstance(data, str).
  I do not need to actually see the binary form.
 
  Then don't use the binary form.  It doesn't make the computation any
  more powerful and it'll certainly slow it down.
 The title of the question is string to binary and back.
 
  Are you trying to match some other program's algorithm, and thus have
  strange constraints on your data?  Or are you simply trying to make a
  secure way to encrypt binary files, using one-time pads?
 I already answered this question...

Yes, you stated that it had to work on string and files, but are the 
files binary? DaveA and I are asking the questions because given
what you are asking it just seems like you are not using the Right
approach. I can touch my nose by touching my nose with my hand, 
or asking the person next to me to pick up my hand and use it to
touch my nose. Both work, one is just faster and easier to
understand.

 
  A one-time pad is the same size as the message, so you simply need to
  convert the message into a large-int, and xor them.

Ramit
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Dave Angel
On 07/18/2012 05:07 PM, Jordan wrote:
 OK so I have been trying for a couple days now and I am throwing in the
 towel, Python 3 wins this one.

I should have paid more attention to this the first time.  Clearly you
don't want help.


-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Prasad, Ramit
  bin_data = [ bin(ord(char)).split('b')[1].zfill(8) for char in data ]
  bin_string = ''.join(bin_data)
  bin_list = [ chr( int(char, 2) ) for char in bin_data ]
 Thank you exactly what I was looking for!
 
  I am not really sure what you are getting at with XOR and one time
  padding, but it has been a while since I have done any encryption.
 And I have just started reading Applied Cryptography, so I am putting
 some of what I learn into practice.
 
  I would think you could do all this by just converting everything
  to int and then adding/replacing the pad in the list of ints.
 At first I was essentially doing just that, but when I first converted
 the large integers that are being used for the one time pad as the key
 to binary I saw how much larger it was, and then realized that was the
 bit length of the integer (technically Long). By doing that, I can get
 more out of the one time pad, but if you XOR binary against Ord, very
 few values will be changed because binary is only 1s and 0s as you know.
 To optimize the keys use, whether it wastes memory or not, I wanted to
 use binary on binary, this really comes into play with files, not so
 much the shorter strings.

How are you XOR-ing binary against something else? At a low level the data
is pretty similar so that they should be mostly interchangeable. It is
when you start abstracting the data that you have to convert between
abstractions.

Hold on, let me try a different angle. int, binary, and hex version of 
a number (lets say 65) are all just different representations of the same
number. The only thing that changes is the base.

65 in octal (base 10) is 65
65 in hex (base 16) is 41
65 in binary (base 2 ) is 101

But they are ALL the same number. 
 int( '65', 10 )
65
 int( '41', 16 )
65
 int( '101', 2 )
65


 But since you bring if up, how would you convert a file to a list of ints?

with open(filename, 'r' ) as f:
ints = [ ord( char ) for line in f for char in line ]

Now all you need to do is modify the list to include your padding.

 but when I first converted
 the large integers that are being used for the one time pad as the key
 to binary I saw how much larger it was, and then realized that was the
 bit length of the integer (technically Long). By doing that, I can get
 more out of the one time pad,

Large integers? Are you adding the integers for some reason? Extended 
ASCII only has ordinal values less than 256.

Ramit

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Jordan


On 07/19/2012 10:48 PM, Prasad, Ramit wrote:
 SNIP

 OK. I am using one time pads to XOR data, but the one time pads (keys)
 are very large numbers, converting them to binary increases their size
 exponentially, which allows me to get more XORing done out of a single
 You want to explain this impossibility of increasing size
 exponentially?  If you're wanting to waste memory, there are better
 ways.  But it's only 8 times as big to save a string of 1's and zeros as
 to save the large-int they represent.  And multiplying by 8 isn't an
 exponential function.

 Yes if you wish to dissect my words the wrong word was chosen...
 key. I am XORing both files and strings so I need to have code that can
 do both even if that means two branches of code via an if/else perhaps
 with an isinstance(data, str).
 I do not need to actually see the binary form.

 Then don't use the binary form.  It doesn't make the computation any
 more powerful and it'll certainly slow it down.
 The title of the question is string to binary and back.
 Are you trying to match some other program's algorithm, and thus have
 strange constraints on your data?  Or are you simply trying to make a
 secure way to encrypt binary files, using one-time pads?
 I already answered this question...
 Yes, you stated that it had to work on string and files, but are the 
 files binary? DaveA and I are asking the questions because given
 what you are asking it just seems like you are not using the Right
 approach. I can touch my nose by touching my nose with my hand, 
 or asking the person next to me to pick up my hand and use it to
 touch my nose. Both work, one is just faster and easier to
 understand.
I am not sure how to answer that question because all files are binary,
but the files that I will parse have an encoding that allows them to be
read in a non-binary output. But my program will not use the in a
non-binary way, that is why I plan to open them with the 'b' mode to
open them as binary with no encoding assumed by python. I just not have
tested this new technique that you gave me on a binary file yet as I was
still implementing it for strings.
I may not be using the right appraoch that is why I am asking. I also
understand why the questions are needed, so you can understand my
intent, so that you can better help me. But since DaveA and I had a
misunderstanding over the missing indentation, for which I apologized
and explained that my email editor is stripping the spaces, he seems to
be badgering me.

You want to explain this impossibility of increasing size
exponentially?  If you're wanting to waste memory, there are better
ways.

Now I would like to make it clear I very much so appreciate the help! So
again, Thank you.

 A one-time pad is the same size as the message, so you simply need to
 convert the message into a large-int, and xor them.
 Ramit
 This email is confidential and subject to important disclaimers and
 conditions including on offers for the purchase or sale of
 securities, accuracy and completeness of information, viruses,
 confidentiality, legal privilege, and legal entity disclaimers,
 available at http://www.jpmorgan.com/pages/disclosures/email.  
 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Prasad, Ramit
 I am not sure how to answer that question because all files are binary,
 but the files that I will parse have an encoding that allows them to be
 read in a non-binary output. But my program will not use the in a
 non-binary way, that is why I plan to open them with the 'b' mode to
 open them as binary with no encoding assumed by python. I just not have
 tested this new technique that you gave me on a binary file yet as I was
 still implementing it for strings.

As far as I know, even in binary mode, python will convert the 
binary data to read and write strings. So there is no reason 
this technique would not work for binary. Note, I was able to use
the string representation of a PDF file to write another PDF file.
So you do not need to worry about the conversion of binary to strings.
All you need to do is convert the string to int, encrypt, decrypt, 
convert back to string, and write out again.

Note Python3 being Unicode might change things a bit. Not sure if
you will need to convert to bytes or some_string.decode('ascii').

Now if you end up needing to handle non-ASCII data, then this exercise
gets more complicated. Not sure if a simple way to convert all characters
to a numerical point, but it should still be possible. If your data
is binary, then I do not think you will run into any issues.


Ramit


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] check against multiple variables

2012-07-19 Thread Steven D'Aprano

Selby Rowley-Cannon wrote:
I am using a hash table in a small randomization program. I know that 
some hash functions can be prone to collisions, so I need a way to 
detect collisions.


I doubt that very much.

This entire question seems like a remarkable case of premature optimization. 
Start with demonstrating that collisions are an actual problem that need fixing.


Unless you have profiled your application and proven that hash collisions is a 
real problem -- and unless you are hashing thousands of float NANs, that is 
almost certainly not the case -- you are just wasting your time and making 
your code slower rather than faster -- a pessimation, not optimization.


And if it *is* a problem, then the solution is to fix your data so that its 
__hash__ method is less likely to collide. If you are rolling your own hash 
method, instead of using one of Python's, that's your first problem.


Python's hash implementation is one of the most finely tuned in the world. 
Many, many years of effort have gone into making it stand up to real-world 
data. You aren't going to beat it with some half-planned pure-Python work-around.




--
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread Dave Angel
On 07/19/2012 05:55 PM, Prasad, Ramit wrote:
 I am not sure how to answer that question because all files are binary,
 but the files that I will parse have an encoding that allows them to be
 read in a non-binary output. But my program will not use the in a
 non-binary way, that is why I plan to open them with the 'b' mode to
 open them as binary with no encoding assumed by python. I just not have
 tested this new technique that you gave me on a binary file yet as I was
 still implementing it for strings.
 As far as I know, even in binary mode, python will convert the 
 binary data to read and write strings. So there is no reason 
 this technique would not work for binary. Note, I was able to use
 the string representation of a PDF file to write another PDF file.
 So you do not need to worry about the conversion of binary to strings.
 All you need to do is convert the string to int, encrypt, decrypt, 
 convert back to string, and write out again.

 Note Python3 being Unicode might change things a bit. Not sure if
 you will need to convert to bytes or some_string.decode('ascii').

In Python 3, if you open the file  with b  (as Jordan has said), it
creates a bytes object.  No use of strings needed or wanted.  And no
assumptions of ascii, except for the output of the % operator on a hex
conversion.


myfile = open(filename, b)
data = myfile.read(size)

At that point, convert it to hex with:
   hexdata = binascii.hexlify(data)
then convert that to an integer:
   numdata = int(hexdata, 16)

At that point, it's ready to xor with the one-time key, which had better
be the appropriate size to match the data length.

newhexdata = bytes(%x % numdata, ascii)
newdata = binascii.unhexlify(newhexdata)

If the file is bigger than the key, you have to get a new key. If the
keys are chosen with a range of 2**200, then you'd read and convert the
file 25 bytes at a time.



-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Calling a function does not return what I want it to return

2012-07-19 Thread Alexander Q.
I have this little program that is supposed to calculate how many diagonals
a polygon of x sides has, but it does not return what I have in the
return part of the function when I call it. Here is the code:

def num_diag(var):
  ans = 0
  if var = 3:
print(No diagonals.)
  else:
for i in range(num_sides - 3):
  ans = ans + i

  return (((var - 3)*2) + ans)

num_sides = (int(raw_input(Enter sides: )))
num_diag(num_sides)


Any suggestions as to what is going on? When I run it, it prompts me for
the number of sides, and that's it.
Thanks.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] re 33.116

2012-07-19 Thread Emile van Sebille

I found ~200k files in /var/log all but 227 look like:

list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.1.gz
list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.1.gz.1.gz
list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.1.gz.1.gz.1.gz
list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.1.gz.2.gz
list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.2.gz
list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.2.gz.1.gz
list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.2.gz.3.gz
list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.3.gz
list_boxes.day.1.gz.1.gz.1.gz.3.gz.1.gz.1.gz.2.gz.1.gz.1.gz.3.gz.1.gz

in both the day and night variant.  I erased them all as / was at 100% 
used.  It's now at 89% with ~400Mb free.


Emile

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] re 33.116

2012-07-19 Thread Emile van Sebille

On 7/19/2012 4:10 PM Emile van Sebille said...

I found ~200k files in /var/log all but 227 look like:


Sorry -- my bad.

Emile



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Calling a function does not return what I want it to return

2012-07-19 Thread Prasad, Ramit
 I have this little program that is supposed to calculate how many diagonals a
 polygon of x sides has, but it does not return what I have in the return
 part of the function when I call it. Here is the code:
 
 def num_diag(var):
   ans = 0
   if var = 3:
 print(No diagonals.)
   else:
 for i in range(num_sides - 3):
   ans = ans + i
 
   return (((var - 3)*2) + ans)
 
 num_sides = (int(raw_input(Enter sides: )))
 num_diag(num_sides)



 num_diag(5)
NameError: global name 'num_sides' is not defined


`for i in range(num_sides - 3):`
Change num_sides to var.

Ramit

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Calling a function does not return what I want it to return

2012-07-19 Thread Emile van Sebille

On 7/19/2012 3:58 PM Alexander Q. said...

I have this little program that is supposed to calculate how many
diagonals a polygon of x sides has, but it does not return what I have
in the return part of the function when I call it. Here is the code:

def num_diag(var):
   ans = 0
   if var = 3:
 print(No diagonals.)
   else:
 for i in range(num_sides - 3):
   ans = ans + i

   return (((var - 3)*2) + ans)

num_sides = (int(raw_input(Enter sides: )))



You're almost there.  Change the following


num_diag(num_sides)


to

print num_diag(num_sides)
(for pythons  v3)   or

print (num_diag(num_sides))
(for python v3 )

Then see where that takes you.


Emile




Any suggestions as to what is going on? When I run it, it prompts me for
the number of sides, and that's it.
Thanks.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor





___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Calling a function does not return what I want it to return

2012-07-19 Thread Dave Angel
On 07/19/2012 06:58 PM, Alexander Q. wrote:
 I have this little program that is supposed to calculate how many diagonals
 a polygon of x sides has, but it does not return what I have in the
 return part of the function when I call it. Here is the code:

 def num_diag(var):
   ans = 0
   if var = 3:
 print(No diagonals.)
   else:
 for i in range(num_sides - 3):
   ans = ans + i

   return (((var - 3)*2) + ans)

 num_sides = (int(raw_input(Enter sides: )))
 num_diag(num_sides)


 Any suggestions as to what is going on? When I run it, it prompts me for
 the number of sides, and that's it.
 Thanks.



You never use the return value.  Try assigning it, and printing it.

result = num_diag(num_sides)
print(final answer=, result)

-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Creating a dictionary on user filter

2012-07-19 Thread Mike Nickey
Hi All,

I have a few lists that I'm trying to put into a dictionary based on
which list the user wants to use as a filter. If the user selects 1
the the dictionary would be created using the first list as the keys
and the secondary items as the values. If the user selects 2, the
dictionary would be created with the second list as the keys, and the
remaining as the values. I think using dict(zip(firstList,
(secondList, thirdList))) is the way to go but I'm having trouble with
the placement of the items.

What I have is this:
firstList = ['a', 'b', 'c']
secondList = [1,2,3]
thirdList = [1.20, 1.23, 2.54]

What I am looking for is something like this for output:
{'a': [1, 1.20], 'b': [2, 1.23], 'c': [3, 2.54]}

What I'm now thinking is that I need to loop over each item in the
list and update the dictionary such as:
for x in range(a):
compilation = dict(zip(a[x], (b[x], c[x])))

Any help is appreciated.

-- 
~MEN
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Calling a function does not return what I want it to return

2012-07-19 Thread Alan Gauld

On 20/07/12 00:17, Prasad, Ramit wrote:



def num_diag(var):
   ans = 0
   if var = 3:
 print(No diagonals.)
   else:
 for i in range(num_sides - 3):
   ans = ans + i

   return (((var - 3)*2) + ans)

num_sides = (int(raw_input(Enter sides: )))
num_diag(num_sides)




NameError: global name 'num_sides' is not defined

`for i in range(num_sides - 3):`
Change num_sides to var.


It should work without, because it will pick up the
global variable definition.

It's probably not working the way it was intended to,
but it should work... But changing it to use the
argument would definitely be better.


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Calling a function does not return what I want it to return

2012-07-19 Thread Alexander Q.
On Thu, Jul 19, 2012 at 4:21 PM, Dave Angel d...@davea.name wrote:

 On 07/19/2012 06:58 PM, Alexander Q. wrote:
  I have this little program that is supposed to calculate how many
 diagonals
  a polygon of x sides has, but it does not return what I have in the
  return part of the function when I call it. Here is the code:
 
  def num_diag(var):
ans = 0
if var = 3:
  print(No diagonals.)
else:
  for i in range(num_sides - 3):
ans = ans + i
 
return (((var - 3)*2) + ans)
 
  num_sides = (int(raw_input(Enter sides: )))
  num_diag(num_sides)
 
 
  Any suggestions as to what is going on? When I run it, it prompts me for
  the number of sides, and that's it.
  Thanks.
 
 

 You never use the return value.  Try assigning it, and printing it.

 result = num_diag(num_sides)
 print(final answer=, result)

 --

 DaveA

 That did it- thanks Dave!

-Alex
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Invalid Token Problem

2012-07-19 Thread Osemeka Osuagwu
Hi folks,
I've been trying to convert numbers from digits to words, I wrote the
following code;


units = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight',
'nine']
teens = ['eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen',
'seventeen', 'eighteen', 'nineteen']
tens = ['ten', 'twenty', 'thirty', 'fourty', 'fifty', 'sixty', 'seventy',
'eighty', 'ninety']

def num2word(num):
wordlist = []
if len(str(num)) == 4:
wordlist = [units[1] + 'thousand']

if len(str(num)) == 3:
if num%100 == 0:
wordlist = [units[eval(str(num)[-3])-1] + 'hundred']
else:
wordlist = [units[eval(str(num)[-3])-1],'hundred', 'and',
num2word(eval(str(num)[-2:]))]

if len(str(num)) == 2:
if num%10 == 0:
wordlist = [tens[eval(str(num)[-2])-1]]
elif 10eval(str(num))20:
wordlist = [teens[eval(str(num)[-1])-1]]
else:
wordlist = [tens[eval(str(num)[-2])-1],
units[eval(str(num)[-1])-1]]

if len(str(num)) == 1:
wordlist = [units[num-1]]
return ' '.join(wordlist)

for i in range(1, 200):
print i, num2word(i)


but when I let it run till i = 108, it gives me an invalid token error as
follows;

...
99 ninety nine
100 onehundred
101 one hundred and one
102 one hundred and two
103 one hundred and three
104 one hundred and four
105 one hundred and five
106 one hundred and six
107 one hundred and seven
108

Traceback (most recent call last):
  File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE
University/Python/Python Code/MyCode/Project Euler code/Project Euler
answer 17.py, line 33, in module
print i, num2word(i)
  File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE
University/Python/Python Code/MyCode/Project Euler code/Project Euler
answer 17.py, line 18, in num2word
wordlist = [units[eval(str(num)[-3])-1],'hundred', 'and',
num2word(eval(str(num)[-2:]))]
  File string, line 1
08
 ^
SyntaxError: invalid token



I am at a loss, please help.

gratefully,
Abasiemeka
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] string to binary and back... Python 3

2012-07-19 Thread eryksun
On Thu, Jul 19, 2012 at 5:32 PM, Jordan wolfrage8...@gmail.com wrote:

 I am not sure how to answer that question because all files are binary,
 but the files that I will parse have an encoding that allows them to be
 read in a non-binary output. But my program will not use the in a
 non-binary way, that is why I plan to open them with the 'b' mode to
 open them as binary with no encoding assumed by python. I just not have
 tested this new technique that you gave me on a binary file yet as I was
 still implementing it for strings.

Reading from a file in binary mode returns a bytes object in Python 3.
Since iterating over bytes returns ints, you can cycle the key over
the plain text using zip and compute the XOR without having to convert
the entire message into a single big number in memory. Here's my
example from before, adapted for files:

 from itertools import cycle
 key = b'1234'
 kit = cycle(key)
 with open('temp.txt', 'rb') as f, open('cipher.txt', 'wb') as fo:
... fit = iter(lambda: f.read(512), b'')
... for text in fit:
... fo.write(bytes(x^y for x,y in zip(text, kit)))

Since the input file could be arbitrarily large and lack newlines, I'm
using iter to create a special iterator that reads 512-byte chunks.
The iterator stops when read returns an empty bytes object (i.e.
b''). You could use a while loop instead.

I assume here that the key is possibly shorter than the message (e.g.
encrypting 1 megabyte of text with a 128 byte key). If you're making a
one-time pad I think the key is the same length as the message. In
that case you wouldn't have to worry about cycling it. Anyway, I'm not
particularly interested in cryptography. I'm just trying to help with
the operations.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Invalid Token Problem

2012-07-19 Thread Dave Angel
On 07/19/2012 08:36 PM, Osemeka Osuagwu wrote:
 snip...
 99 ninety nine
 100 onehundred
 101 one hundred and one
 102 one hundred and two
 103 one hundred and three
 104 one hundred and four
 105 one hundred and five
 106 one hundred and six
 107 one hundred and seven
 108

 Traceback (most recent call last):
   File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE
 University/Python/Python Code/MyCode/Project Euler code/Project Euler
 answer 17.py, line 33, in module
 print i, num2word(i)
   File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE
 University/Python/Python Code/MyCode/Project Euler code/Project Euler
 answer 17.py, line 18, in num2word
 wordlist = [units[eval(str(num)[-3])-1],'hundred', 'and',
 num2word(eval(str(num)[-2:]))]
   File string, line 1
 08
  ^
 SyntaxError: invalid token


08 isn't a valid literal.  Remove the leading zero.  That says that the
following digits are to be interpreted as octal, and 8 isn't a valid
octal digit.

Much better would be to eliminate the unnecessary use of eval().  It's
dangerous, and sometimes doesn't do what you expect.




-- 

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Invalid Token Problem

2012-07-19 Thread Ross Wilson

On 20/07/12 10:45, Dave Angel wrote:

On 07/19/2012 08:36 PM, Osemeka Osuagwu wrote:

snip...
99 ninety nine
100 onehundred
101 one hundred and one
102 one hundred and two
103 one hundred and three
104 one hundred and four
105 one hundred and five
106 one hundred and six
107 one hundred and seven
108

Traceback (most recent call last):
   File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE
University/Python/Python Code/MyCode/Project Euler code/Project Euler
answer 17.py, line 33, in module
 print i, num2word(i)
   File C:/Windows.old/Users/Abasiemeka/Abasiemeka/GOOGLE
University/Python/Python Code/MyCode/Project Euler code/Project Euler
answer 17.py, line 18, in num2word
 wordlist = [units[eval(str(num)[-3])-1],'hundred', 'and',
num2word(eval(str(num)[-2:]))]
   File string, line 1
 08
  ^
SyntaxError: invalid token



08 isn't a valid literal.  Remove the leading zero.  That says that the
following digits are to be interpreted as octal, and 8 isn't a valid
octal digit.


Try to think of another way to convert an integer string into an integer 
value. hINT()



Much better would be to eliminate the unnecessary use of eval().  It's
dangerous, and sometimes doesn't do what you expect.


More specifically, eval() is dangerous if you try to evaluate a string 
supplied by someone else.  You really can't predict what will happen.


However, if you use eval() on strings that you create yourself, it can 
be a handy technique.  When you are starting out, it's best to ignore 
eval() until later.


Ross
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Join email list

2012-07-19 Thread Lily Tran

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] string to binary and back... Python 3

2012-07-19 Thread wolfrage8...@gmail.com
  On Fri, Jul 20, 2012 at 12:33 AM, Dave Angel d...@davea.name wrote:

 On 07/19/2012 05:55 PM, Prasad, Ramit wrote:
  I am not sure how to answer that question because all files are binary,
  but the files that I will parse have an encoding that allows them to be
  read in a non-binary output. But my program will not use the in a
  non-binary way, that is why I plan to open them with the 'b' mode to
  open them as binary with no encoding assumed by python. I just not have
  tested this new technique that you gave me on a binary file yet as I was
  still implementing it for strings.
  As far as I know, even in binary mode, python will convert the
  binary data to read and write strings. So there is no reason
  this technique would not work for binary. Note, I was able to use
  the string representation of a PDF file to write another PDF file.
  So you do not need to worry about the conversion of binary to strings.
  All you need to do is convert the string to int, encrypt, decrypt,
  convert back to string, and write out again.
 
  Note Python3 being Unicode might change things a bit. Not sure if
  you will need to convert to bytes or some_string.decode('ascii').

 In Python 3, if you open the file  with b  (as Jordan has said), it
 creates a bytes object.  No use of strings needed or wanted.  And no
 assumptions of ascii, except for the output of the % operator on a hex
 conversion.


 myfile = open(filename, b)
 data = myfile.read(size)

 At that point, convert it to hex with:
hexdata = binascii.hexlify(data)
 then convert that to an integer:
numdata = int(hexdata, 16)

 At that point, it's ready to xor with the one-time key, which had better
 be the appropriate size to match the data length.

 newhexdata = bytes(%x % numdata, ascii)
 newdata = binascii.unhexlify(newhexdata)

 If the file is bigger than the key, you have to get a new key. If the
 keys are chosen with a range of 2**200, then you'd read and convert the
 file 25 bytes at a time.

Thanks I will give this a try. Can you explian a little further for me what
exactly this:

newhexdata = bytes(%x % numdata, ascii)
line is doing? I don't quite understand the use of the %x % on numdata.



 --

 DaveA

 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor