Re: [Tutor] Start multiple threads from Python

2013-08-05 Thread Ryan Waples
Thanks, that may be just what I'm looking for.

-ryan


On Mon, Aug 5, 2013 at 12:26 PM, Chris Down  wrote:

> On 2013-08-05 12:17, Ryan Waples wrote:
> > Currently I am calling each analysis program one at a time with
> > subprocess.call(). This is working without a hitch, but as each analysis
> > can take a while to run, I want to try to speed things up.  I realize I
> can
> > start three different python sessions to do this, but that just begs the
> > question how to do that from python?
>
> subprocess.Popen does not block unless you explicitly tell it to (by using
> communicate()). Perhaps that's what you want.
>
> >>> import subprocess
> >>> x = subprocess.Popen([ "sleep", "60" ])
> >>> y = subprocess.Popen([ "sleep", "60" ])
> >>> x.pid
> 3035
> >>> y.pid
> 3036
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Start multiple threads from Python

2013-08-05 Thread Ryan Waples
Python 2.7.x on Windows 7.

I'm looking for a bit of advice, not sure how to proceed.

With Python I am generating a file with a bunch of data in it.  I want to
analyse the data in this file with three separate programs.  Each of these
programs is single threaded needs only read access to the data, and they do
not depend on each other.

Currently I am calling each analysis program one at a time with
subprocess.call(). This is working without a hitch, but as each analysis
can take a while to run, I want to try to speed things up.  I realize I can
start three different python sessions to do this, but that just begs the
question how to do that from python?

How can I start multiple independent programs (as in subprocess.call())
without waiting for them to finish?

-Thanks
Ryan
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] data analysis with python

2012-11-13 Thread Ryan Waples
Not sure how stuck you are to python (I have no doubt it can tackle this)
but this is very much the sort of thing that 'R' is *really* good at.
Just FYI.
Good luck
Ryan
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Using the set.difference method with an unknown number of input iterables

2012-10-18 Thread Ryan Waples
cheers, much appreciated

-ryan

On Thu, Oct 18, 2012 at 10:52 PM, Emile van Sebille  wrote:

> On 10/18/2012 10:38 AM, Ryan Waples wrote:> I'm struggling to understand
> how to understand/accomplish the following:
>
> >
> > I have an set ("a" below) and a list of sets ("not_a"), how can I pass
> > the elements of "not_a" to set.difference() so that it it understands I
> > want the difference between set "a" and all the rest
> >
> > set.difference says "Changed in version 2.6: Accepts multiple input
> > iterables".
> > How can I give it multiple input iterables?
> >
> > I get different error msgs depending on what I try, but they just tell
> > me that there is something that I'm missing here.
> >
> > Thanks
> >
> > #Code below
> > a = set([1,2,3,4])
> > b = set([2,3,4,5])
> > c = set([3,4,5,6])
> > d = set([4,5,6,7])
> >
> > not_a = [b,c,d]
> > a.difference(not_a)
>
> Try this as
>
> a.difference(*not_a)
>
> The '*' expands the list to its individual items.
>
> HTH,
>
> Emile
>
>
>
> >
> > # I expect to return set([1]), the same as if I called:
> > a.difference(b,c,d)
> >
> >
> >
> >
> > __**_
> > Tutor maillist  -  Tutor@python.org
> > To unsubscribe or change subscription options:
> > http://mail.python.org/**mailman/listinfo/tutor<http://mail.python.org/mailman/listinfo/tutor>
> >
>
>
> __**_
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/**mailman/listinfo/tutor<http://mail.python.org/mailman/listinfo/tutor>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Using the set.difference method with an unknown number of input iterables

2012-10-18 Thread Ryan Waples
I'm struggling to understand how to understand/accomplish the following:

I have an set ("a" below) and a list of sets ("not_a"), how can I pass the
elements of "not_a" to set.difference() so that it it understands I want
the difference between set "a" and all the rest

set.difference says "Changed in version 2.6: Accepts multiple input
iterables".
How can I give it multiple input iterables?

I get different error msgs depending on what I try, but they just tell me
that there is something that I'm missing here.

Thanks

#Code below
a = set([1,2,3,4])
b = set([2,3,4,5])
c = set([3,4,5,6])
d = set([4,5,6,7])

not_a = [b,c,d]
a.difference(not_a)

# I expect to return set([1]), the same as if I called:
a.difference(b,c,d)
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Genetic module recommendations?

2012-08-18 Thread Ryan Waples
Not sure if it meets your needs, but you should at least check out simuPOP.


On Sat, Aug 18, 2012 at 12:24 PM, Modulok  wrote:
> List,
>
> I'm looking for a good genetic module. (Stable, well documented, pythonic,
> etc.)
>
> I'm writing a breeding simulator where users select parent organisms to breed
> based on traits they favor, e.g: eye color, height, etc. The human player is
> the fitness function. I guess this is "artificial selection"? After breeding
> the user gets an offspring which carries traits from its parents.
>
> It's only a game and a crude approximation of nature at best. However, the
> algorithm has to be stable enough this can go on for several hundred
> generations without entropy reducing to the point of haulting evolution. On 
> the
> other hand, I don't want so much entropy that it's reduce to a random search.
>
> Before I write my own, I thought I'd ask to see if there was a third party,
> de-facto standard Python genetic module. Or at least one that is highly
> recommended.
>
> Any suggestions?
>
> Thanks!
> -Modulok-
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-19 Thread Ryan Waples
> I count only 19 lines.

yep, you are right.  My bad, I think I missing copy/pasting line 20.

>The first group has only three lines. See below.

Not so, the first group is actually the first four lines listed below.
 Lines 1-4 serve as one group.  For what it is worth, line four should
have 1 character for each char in line 1, and the first line is much
shorter, contains a space, and for this file always ends in either
"1:N:0:" (keep) "1"Y"0:" (remove).   The EXAMPLE data is correctly
formatted as it should be, but I'm missing line 20.

> There is a blank line, which I take as NOT part of the input but just a
> spacer. Then:
>
> 1) Line starting with @
> 2) Line of bases CGCGT ...
> 3) Plus sign
> 4) Line starting with @@@
> 5) Line starting with @
> 6) Line of bases TTCTA ...
> 7) Plus sign
>
> and so on. There are TWO lines before the first +, and three before each
> of the others.

I think you are just reading one frame shifted, its not a well
designed format because the required start character "@", can appear
other places as well


>
>
>> __EXAMPLE RAW DATA FILE REGION__
>>
>> @HWI-ST0747:167:B02DEACXX:8:1101:3182:167088 1:N:0:
>> CGCGTGTGCAGGTTTATAGAACCAGCTGCAGATTAGTAGCAGCGCACGGAGAGGTGTGTCTGTTTATTGTCCTCAGCAGGCAGACATGTTTGTGGTC
>> +
>> @@@DDADDHB9+2A;6(5@CDAC(5(5:5,(8?88?BC@#
>> @HWI-ST0747:167:B02DEACXX:8:1101:3134:167090 1:N:0:
>> TTCTAGTGCAGGGCGACAGCGTTGCGGAGCCGGTCCGAGTCTGCTGGGTCAGTCATGGCTAGTTGGTACTATAACGACACAGGGCGAGACCCAGATGCAAA
>> +
>> @CCFFFDFHJJIJHHIIIJHGHIJI@GFFDDDFDDCEEEDCCBDCCCCCB>>@C(4@ADCA>>?BBBDDABB055<>-?A> @HWI-ST0747:167:B02DEACXX:8:1101:3002:167092 1:N:0:
>> CTTTGCTGCAGGCTCATCCTGACATGACCCTCCAGCATGACAATGCCACCAGCCATACTGCTCGTTCTGTGTGTGATTTCCAGCAAGTAAATATGTA
>> +
>> CCCFHIJIEHIH@AHFAGHIGIIGGEIJGIJIIIGIIIGEHGEHIIJIEHH@FHGH@=ACEHHFBFFCE@AACCA>AD>BA
>> @HWI-ST0747:167:B02DEACXX:8:1101:3022:167094 1:N:0:
>> ATTCCGTGCAGGCCAACTCCCGACGGACATCCTTGCTCAGACTGCAGCGATAGTGGTCGATCAGGGCCCTGTTGTTCCATCCCACTCCGGCGACCAGGTTC
>> +
>> CCCFHIDHJIIHIIIJIJIIGGIIFHJIIIIEIFHFF>CBAECBDDDC:??B=AAACD?8@:>C@?8CBDDD@D99B@>3884>A
>> @HWI-ST0747:167:B02DEACXX:8:1101:3095:167100 1:N:0:
>> CGTGATTGCAGGGACGTTACAGAGACGTTACAGGGATGTTACAGGGACGTTACAGAGACGTTAAAGAGATGTTACAGGGATGTTACAGACAGAGACGTTAC
>> +


>
> Your code says that the first line in each group should start with an @
> sign. That is clearly not the case for the last two groups.
>
> I suggest that your data files have been corrupted.

I'm pretty sure that my raw IN files are all good, its hard to be sure
with such a large file, but the very picky downstream analysis program
takes every single raw file just fine (30 of them), and gaks on my
filtered files, at regions that don't conform to the correct
formatting.

>
>> __PYTHON CODE __
>
> I have re-written your code slightly, to be a little closer to "best
> practice", or at least modern practice. If there is anything you don't
> understand, please feel free to ask.
>
> I haven't tested this code, but it should run fine on Python 2.7.
>
> It will be interesting to see if you get different results with this.

--CODE REMOVED--

Thanks, for the suggestions.  I've never really felt super comfortable
using objects at all, but its what I want to learn next.  This will be
helpful, and useful.

 > for reads, lines in four_lines( INFILE ):
ID_Line_1, Seq_Line, ID_Line_2, Quality_Line = lines

Can you explain what is going on here, or point me In the right
direction?  I see that the parts of 'lines' get assigned, but I'm
missing how the file gets iterated over and how reads gets
incremented.

Do you have a reason why this approach might give a 'better' output?

Thanks again.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Ryan Waples
>>>
>>> If you copy those files to a different device (one that has just been 
>>> scrubbed and reformatted), then copy them back and get different results 
>>> with your application, you've found your problem.
>>>
>>> -Bill
>>
>> Thanks for the insistence,  I'll check this out.  If you have any
>> guidance on how to do so let me know.  I knew my system wasn't
>> particularly well suited to the task at hand, but I haven't seen how
>> it would actually cause problems.
>>
>> -Ryan
>> ___
>> The last two lines in my MSG pretty much would be the test. Get another 
>> flash drive, format it as FAT-32 (I assume that's what you are using), then 
>> copy a couple of files to it.  Then copy them back to your current device 
>> and run your program again. If you get DIFFERENT, but still wrong results, 
>> you've found the problem. The largest positive integer a 32-bit binary 
>> number can represent is 2^32, which is 4Gig.  I'm no expert on Window's 
>> files, but I'd be very surprised if when the FAT-32 file system was being 
>> designed, anyone considered the case where a single file could be that large.
>
> -Bill


The hard-drive is formatted as NTFS, because as you say I'm up against
the file size limit of FAT32 , do think this could still be the issue?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Ryan Waples
On Wed, Jul 18, 2012 at 8:23 PM, Lee Harr  wrote:
>
>>   grep ^TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT$> with no results
>
> How about:
> grep TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT outfile
> Just in case there is some non-printing character in there...

There are many instances of that sequence of characters in the RAW
input file, but that is what I would expect.




>
> Beyond that ... my guess would be that you are either not readingthe file you 
> think you are, or not writing the file you think you are  :o)
> out = each.replace('/gzip', '/rem_clusters2')
> Seems pretty bulletproof, but maybe just print each and out hereto make 
> sure...

Checked this multiple times

>
> Also, I'm curious... Reading your code, I sort of feel like when I 
> amlistening to a non-native speaker. I always get the urge to throw out 
> thecorrect "Americanisms" for people -- to help them fit in better. So, I 
> hope itdoes not make me a jerk, but ...
> infile = open(each, 'r') # I'd probably drop the 'r' also...

working in science, I try to be as explicit as possible, I've come to
dislike Perl for this reason.

> while not check_for_end_of_file:
> reads += 1
> head, sep, tail = id_line_1.partition(' ') # or, if I'm only using the one 
> thing ..._, _, meaningful_name = id_line_1.partition(' ') # maybe call it 
> "selector", then ...
> if selector in ('1:N:0:', '2:N:0:'):
>

Points taken, thanks.


> Hope this helps.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Ryan Waples
On Wed, Jul 18, 2012 at 8:04 PM, William R. Wing (Bill Wing)
 wrote:
> On Jul 18, 2012, at 10:33 PM, Ryan Waples wrote:
>
>> Thanks for the replies, I'll try to address the questions raised and
>> spur further conversation.
>>
>>> "those numbers (4GB and 64M lines) look suspiciously close to the file and 
>>> record pointer limits to a 32-bit file system.  Are you sure you aren't 
>>> bumping into wrap around issues of some sort?"
>>
>> My understanding is that I am taking the files in a stream, one line
>> at a time and never loading them into memory all at once.  I would
>> like (and expect) my script to be able to handle files up to at least
>> 50GB.  If this would cause a problem, let me know.
>
> [Again, stripping out everything elseā€¦]
>
> I don't think you understood my concern.  The issue isn't whether or not the 
> files are being read as a stream, the issue is that at something like those 
> numbers a 32-bit file system can silently fail.  If the pointers that are 
> chaining allocation blocks together (or whatever Windows calls them) aren't 
> capable of indexing to sufficiently large numbers, then you WILL get garbage 
> included in the file stream.
>
> If you copy those files to a different device (one that has just been 
> scrubbed and reformatted), then copy them back and get different results with 
> your application, you've found your problem.
>
> -Bill

Thanks for the insistence,  I'll check this out.  If you have any
guidance on how to do so let me know.  I knew my system wasn't
particularly well suited to the task at hand, but I haven't seen how
it would actually cause problems.

-Ryan
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Ryan Waples
Thanks for the replies, I'll try to address the questions raised and
spur further conversation.

>"those numbers (4GB and 64M lines) look suspiciously close to the file and 
>record pointer limits to a 32-bit file system.  Are you sure you aren't 
>bumping into wrap around issues of some sort?"

My understanding is that I am taking the files in a stream, one line
at a time and never loading them into memory all at once.  I would
like (and expect) my script to be able to handle files up to at least
50GB.  If this would cause a problem, let me know.

> "my hunch is you might be having issues related to linux to dos  EOF char."

I don't think this is the issue.  99.99% of the lines come out ok,
(see examples).  I do end up with an output file with some 50 some mil
lines.  I can confirm that my python code as written and executed on
Win7 will convert the original file endings from Unix (LF) to windows
(CRLF).  This shouldn't confuse the downstream analysis.

> "What are you doing to test that they don't match the original?"

Those 2 pieces of example data have been grep'd (cygwin) out of the IN
and OUT files, they represent the output of a grep that pull the 20
lines surrounding the line:
@HWI-ST0747:167:B02DEACXX:8:1101:3002:167092 1:N:0:
which is a unique line in each.

I have also grep'd the IN file for a line in the OUT:
  grep ^TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT$
with no results

The python code posted has a (weak) check, that mostly serves to
confirm that every fourth line of the IN file starts with an "@", this
IS the case for the IN file, but is NOT the case for the OUT file.

I can run my analysis program program on the raw IN file fine, it will
process all entries.  When the OUT file is supplied, it will error at
reads in the pasted text.

> "Earlier, you stated that each record should be four lines. But your sample 
> data starts with a record of three lines."

I've checked again and they look to be four lines, so I'm not sure I understand.
Format:
1) ID line (must start with @) - contains filter criteria
2) Data line 1 - 101 chars
3) just a "+"
4) Data line 2 - 101 chars (may start with @)

> "Do they occur at random, or is this repeatable?"

When I'm back at work I'll confirm again that this is the case, I
should have a better answer here.  I can confirm that it seems to
happen to every (large) file I've tested, no files seem unaffected.

Thanks

__SHORTENED PYTHON CODE__

for each in my_in_files:
out = each.replace('/gzip', '/rem_clusters2' )
INFILE = open (each, 'r')
OUTFILE = open (out , 'w')

# Tracking Variables
Reads = 0
Writes = 0
Check_For_End_Of_File = 0


# Read FASTQ File by group of four lines
while Check_For_End_Of_File == 0:

ID_Line_1   = INFILE.readline()
Seq_Line= INFILE.readline()
ID_Line_2   = INFILE.readline()
Quality_Line= INFILE.readline()

ID_Line_1   = ID_Line_1.strip()
Seq_Line= Seq_Line.strip()
ID_Line_2   = ID_Line_2.strip()
Quality_Line= Quality_Line.strip()

Reads = Reads + 1

#Check that I have not reached the end of file
if Quality_Line == "":
Check_For_End_Of_File = 1
break

#Check that ID_Line_1 starts with @
if not ID_Line_1.startswith('@'):
break

# Select Reads that I want to keep
ID = ID_Line_1.partition(' ')
if (ID[2] == "1:N:0:" or ID[2] == "2:N:0:"):

# Write to file, maintaining 
group of 4
OUTFILE.write(ID_Line_1 + "\n")
OUTFILE.write(Seq_Line + "\n")
OUTFILE.write(ID_Line_2 + "\n")
OUTFILE.write(Quality_Line + "\n")
Writes = Writes +1

INFILE.close()
OUTFILE.close()
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Ryan Waples
I'm seeing some unexpected output when I use a script (included at
end) to iterate over large text files.  I am unsure of the source of
the unexpected output and any help would be much appreciated.

Background
Python v 2.7.1
Windows 7 32bit
Reading and writing to an external USB hard drive

Data files are ~4GB text (.fastq) file, it has been uncompressed
(gzip).  This file has no errors or formatting problems, it seems to
have uncompressed just fine.  64M lines, each 'entry' is split across
4 consecutive lines, 16M entries.

My python script iterates over data files 4 lines at a time, selects
and writes groups of four lines to the output file.  I will end up
selecting roughly 85% of the entries.

In my output I am seeing lines that don't occur in the original file,
and that don't match any lines in the original file.  The incidences
of badly formatted lines don't seem to match up with any patterns in
the data file, and occur across multiple different data files.

I've included 20 consecutive lines of input and output.  Each of these
5 'records' should have been selected and printed to the output file.
But there is a problem with the 4th and 5th entries in the output, and
it no longer matches the input as expected.  For example the line:
TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT
never occurs in the original data.

Sorry for the large block of text below.
Other pertinent info, I've tried a related perl script, and ran into
similar issues, but not in the same places.

Any help or insight would be appreciated.

Thanks


__EXAMPLE RAW DATA FILE REGION__

@HWI-ST0747:167:B02DEACXX:8:1101:3182:167088 1:N:0:
CGCGTGTGCAGGTTTATAGAACCAGCTGCAGATTAGTAGCAGCGCACGGAGAGGTGTGTCTGTTTATTGTCCTCAGCAGGCAGACATGTTTGTGGTC
+
@@@DDADDHB9+2A;6(5@CDAC(5(5:5,(8?88?BC@#
@HWI-ST0747:167:B02DEACXX:8:1101:3134:167090 1:N:0:
TTCTAGTGCAGGGCGACAGCGTTGCGGAGCCGGTCCGAGTCTGCTGGGTCAGTCATGGCTAGTTGGTACTATAACGACACAGGGCGAGACCCAGATGCAAA
+
@CCFFFDFHJJIJHHIIIJHGHIJI@GFFDDDFDDCEEEDCCBDCCCCCB>>@C(4@ADCA>>?BBBDDABB055<>-?AA>AD>BA
@HWI-ST0747:167:B02DEACXX:8:1101:3022:167094 1:N:0:
ATTCCGTGCAGGCCAACTCCCGACGGACATCCTTGCTCAGACTGCAGCGATAGTGGTCGATCAGGGCCCTGTTGTTCCATCCCACTCCGGCGACCAGGTTC
+
CCCFHIDHJIIHIIIJIJIIGGIIFHJIIIIEIFHFF>CBAECBDDDC:??B=AAACD?8@:>C@?8CBDDD@D99B@>3884>A
@HWI-ST0747:167:B02DEACXX:8:1101:3095:167100 1:N:0:
CGTGATTGCAGGGACGTTACAGAGACGTTACAGGGATGTTACAGGGACGTTACAGAGACGTTAAAGAGATGTTACAGGGATGTTACAGACAGAGACGTTAC
+


__EXAMPLE PROBLEMATIC OUTPUT FILE REGION__

@HWI-ST0747:167:B02DEACXX:8:1101:3182:167088 1:N:0:
CGCGTGTGCAGGTTTATAGAACCAGCTGCAGATTAGTAGCAGCGCACGGAGAGGTGTGTCTGTTTATTGTCCTCAGCAGGCAGACATGTTTGTGGTC
+
@@@DDADDHB9+2A;6(5@CDAC(5(5:5,(8?88?BC@#
@HWI-ST0747:167:B02DEACXX:8:1101:3134:167090 1:N:0:
TTCTAGTGCAGGGCGACAGCGTTGCGGAGCCGGTCCGAGTCTGCTGGGTCAGTCATGGCTAGTTGGTACTATAACGACACAGGGCGAGACCCAGATGCAAA
+
@CCFFFDFHJJIJHHIIIJHGHIJI@GFFDDDFDDCEEEDCCBDCCCCCB>>@C(4@ADCA>>?BBBDDABB055<>-?AA>AD>BA
TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT
+
BCCFFDFFFIJIJJHIFGGGGIGGIJIJIGIGIGIGHHIGIIJGJJJIIJIIEHIHHHFFFB@>CCE@BEDCDDAC?CC?ACC??>ADDD
@HWI-ST0747:167:B02DEACXX:8:1304:19473:44548 1:N:0:
CTACAGTGCAGGCACCCGGCCCGCCACAATGAGTCGCTAGAGCGCAATGAGACAAGTAAAGCTGACCAAACCCTTAACCCGGACGATGCTGGG
+
BCCFHIJEHJJIIGIJIGIJIDHDGIGIGGED@CCDDC>C>BBD?BDBAABDDD@BCD@?@BDBDDDBDCCC2




__PYTHON CODE __


import glob

my_in_files = glob.glob ('E:/PINK/Paired_End/raw/gzip/*.fastq')

for each in my_in_files:
#print(each)
out = each.replace('/gzip', '/rem_clusters2' )
#print (out)
INFILE = open (each, 'r')
OUTFILE = open (out , 'w')

# Tracking Variables
Reads = 0
Writes = 0
Check_For_End_Of_File = 0

#Updates
print ("Reading File: " + each)
print ("Writing File: " + out)

# Read FASTQ File by group of four lines
while Check_For_End_Of_File == 0:

# Read the next four lines from the FASTQ file
ID_Line_1   = INFILE.readline()
Seq_Line= INFILE.readline()
ID_Line_2   = INFILE.readline()
Quality_Line= INFILE.readline()

# Strip off leading and trailing whitespace characters
ID_Line_1   = ID_Line_1.strip() 
Seq_Line= Seq_Line.strip()  
ID_Line_2   = ID_Line_2.strip() 
Quality_Line= Quality_Line.strip()

Reads = Reads + 1

#Check that I have not reached the end of file
if Quality_Line == "":
#End of file reached, print update
print ("Saw " + str(Reads) + " reads")
print ("Wrote " + str(Writes) + " reads")
Check_For_End_Of_File = 1
break