[Tutor] plotting several datasets and calling data from afar

2012-03-26 Thread Elaina Ann Hyde
Hi everyone,
   I am trying to set up a code to do some plotting and before I get too
far I wanted to ask some structure questions.  Basically I want to tell
python to read 2 datasets, plot them on the same scale on the same x-y axis
, read a third dataset and match the name from the first dataset, then
label certain values from the third... complicating matters is that all
these data are part of much, much larger sets in seperate files, the paths
look like:
pathway1/namered.dat
pathway2/nameblue.dat
matchingfile.txt

so I do fopen on the matchingfile, read it with asciitable, and then I have
a column in that file called 'name' and a column called 'M', I sort the
file, return a subset that is interesting, and get name1, name2, etc for
every subset.  I want to make a plot that looks like:

plot pathway1/namered.dat and pathway2/nameblue.dat with label 'M' for
every value in the subset name1, each row[i] I need to assign to a seperate
window so that I get a multiplot with a shared x-axis, and stacking my
plots up the y-axis.  I do have multiplot working and I know how to plot
'M' for each subset.

The conceptual trouble has come in, how do I match 'name' variable of my
subset 'name1' with the plot I want to do for pathway1/namered.dat and
pathway2/nameblue.dat... the key feature that is the same is the 'name'
variable, but in one instance I have to match the 'name'+'red.dat' and in
the other the 'name'+'blue.dat'

Any ideas would be appreciated, thanks!
~Elaina Hyde

-- 
PhD Candidate
Department of Physics and Astronomy
Faculty of Science
Macquarie University
North Ryde, NSW 2109, Australia
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] plotting several datasets and calling data from afar

2012-03-26 Thread Evert Rol
  Hi Elaina,


 Hi everyone,
I am trying to set up a code to do some plotting and before I get too far 
 I wanted to ask some structure questions.  Basically I want to tell python to 
 read 2 datasets, plot them on the same scale on the same x-y axis , read a 
 third dataset and match the name from the first dataset, then label certain 
 values from the third... complicating matters is that all these data are part 
 of much, much larger sets in seperate files, the paths look like:
 pathway1/namered.dat
 pathway2/nameblue.dat
 matchingfile.txt
 
 so I do fopen on the matchingfile, read it with asciitable, and then I have a 
 column in that file called 'name' and a column called 'M', I sort the file, 
 return a subset that is interesting, and get name1, name2, etc for every 
 subset.  I want to make a plot that looks like:
 
 plot pathway1/namered.dat and pathway2/nameblue.dat with label 'M' for every 
 value in the subset name1, each row[i] I need to assign to a seperate window 
 so that I get a multiplot with a shared x-axis, and stacking my plots up the 
 y-axis.  I do have multiplot working and I know how to plot 'M' for each 
 subset.
 
 The conceptual trouble has come in, how do I match 'name' variable of my 
 subset 'name1' with the plot I want to do for pathway1/namered.dat and 
 pathway2/nameblue.dat... the key feature that is the same is the 'name' 
 variable, but in one instance I have to match the 'name'+'red.dat' and in the 
 other the 'name'+'blue.dat'

It's not 100% clear to me what you precisely want to do, but here are a few 
possibilites:

- use a dictionary. Assign each dataset to a dictionary with the name as the 
key (or the name + color). Eg, dataset['name1red'], dataset['name1blue'], 
dataset['name2red'] etc. Each value in this dictionary is a dataset read by 
asciitable. 
  the (big) disadvantage is that you would read every single *.dat file 
beforehand
- simply fopen the files with the filename deduced from the name. So, in a 
loop, that would be something like 
   for name in subset_names:
  with fopen(name + 'red.dat') as datafile:
  # read data and plot
  with fopen(name + 'blue.dat') as datafile:
  # read data and plot

But perhaps I misunderstand your problem. I can't really tell if the problem is 
in opening the selected data files, or plotting them, or creating the labels 
for the plot.
This may actually be where some code comes in handy. Provided the plotting 
isn't the problem, you could make a very simple Python script that shows the 
concept and how you attempt to solve it. The simpler the script, the better 
probably the implementation, so even make such a simple script is incredibly 
useful. (Any example data sets of, say, just 2 lines each can also help with 
that.)
From that, the problem may become clearer and we can help you improving the 
script.

Of course, someone else may actually understand the problem properly and has a 
better suggestion.

Cheers,

  Evert

(yes, small world ;-)


 Any ideas would be appreciated, thanks!
 ~Elaina Hyde
 
 -- 
 PhD Candidate
 Department of Physics and Astronomy
 Faculty of Science
 Macquarie University
 North Ryde, NSW 2109, Australia
 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] concurrent file reading using python

2012-03-26 Thread Abhishek Pratap
Hi Guys


I want to utilize the power of cores on my server and read big files
( 50Gb) simultaneously by seeking to N locations. Process each
separate chunk and merge the output. Very similar to MapReduce
concept.

What I want to know is the best way to read a file concurrently. I
have read about file-handle.seek(),  os.lseek() but not sure if thats
the way to go. Any used cases would be of help.

PS: did find some links on stackoverflow but it was not clear to me if
I found the right solution.


Thanks!
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] concurrent file reading using python

2012-03-26 Thread Prasad, Ramit
 I want to utilize the power of cores on my server and read big files
 ( 50Gb) simultaneously by seeking to N locations. Process each
 separate chunk and merge the output. Very similar to MapReduce
 concept.
 
 What I want to know is the best way to read a file concurrently. I
 have read about file-handle.seek(),  os.lseek() but not sure if thats
 the way to go. Any used cases would be of help.
 
 PS: did find some links on stackoverflow but it was not clear to me if
 I found the right solution.


Have you done any testing in this space? I would assume 
you would be memory/IO bound and not CPU bound. Using 
multiple cores would not help non-CPU bound tasks.

I would try and write an initial program that does what
you want without attempting to optimize and then do some
profiling to see if you are using waiting on the CPU
or if you are (as I suspect) waiting on hard disk / memory.

Actually, if you only need small chunks of the file at 
a time and you iterate over the file (for line in file-handle:)
instead of using file-handle.readlines() you will 
probably only be IO bound due to the way Python file 
handling works.

But either way, test first then optimize. :)

Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423

--
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] concurrent file reading using python

2012-03-26 Thread Steven D'Aprano

Abhishek Pratap wrote:

Hi Guys


I want to utilize the power of cores on my server and read big files
( 50Gb) simultaneously by seeking to N locations.


Yes, you have many cores on the server. But how many hard drives is each file 
on? If all the files are on one disk, then you will *kill* performance dead by 
forcing the drive to seek backwards and forwards:


seek to 12345678
read a block
seek to 9947500
read a block
seek to 5891124
read a block
seek back to 12345678 + 1 block
read another block
seek back to 9947500 + 1 block
read another block
...

The drive will spend most of its time seeking instead of reading.

Even if you have multiple hard drives in a RAID array, performance will depend 
strongly the details of how it is configured (RAID1, RAID0, software RAID, 
hardware RAID, etc.) and how smart the controller is.


Chances are, though, that the controller won't be smart enough. Particularly 
if you have hardware RAID, which in my experience tends to be more expensive 
and less useful than software RAID (at least for Linux).


And what are you planning on doing with the files once you have read them? I 
don't know how much memory your server has got, but I'd be very surprised if 
you can fit the entire  50 GB file in RAM at once. So you're going to read 
the files and merge the output... by writing them to the disk. Now you have 
the drive trying to read *and* write simultaneously.


TL; DR:

Tasks which are limited by disk IO are not made faster by using a faster CPU, 
since the bottleneck is disk access, not CPU speed.


Back in the Ancient Days when tape was the only storage medium, there were a 
lot of programs optimised for slow IO. Unfortunately this is pretty much a 
lost art -- although disk access is thousands or tens of thousands of times 
slower than memory access, it is so much faster than tape that people don't 
seem to care much about optimising disk access.




What I want to know is the best way to read a file concurrently. I
have read about file-handle.seek(),  os.lseek() but not sure if thats
the way to go. Any used cases would be of help.


Optimising concurrent disk access is a specialist field. You may be better off 
asking for help on the main Python list, comp.lang.python or 
python-l...@python.org, and hope somebody has some experience with this. But 
chances are very high that you will need to search the web for forums 
dedicated to concurrent disk access, and translate from whatever language(s) 
they are using to Python.



--
Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] concurrent file reading using python

2012-03-26 Thread Abhishek Pratap
Thanks Walter and  Steven for the insight. I guess I will post my
question to python main mailing list and see if people have anything
to say.

-Abhi

On Mon, Mar 26, 2012 at 3:28 PM, Walter Prins wpr...@gmail.com wrote:
 Abhi,

 On 26 March 2012 19:05, Abhishek Pratap abhishek@gmail.com wrote:
 I want to utilize the power of cores on my server and read big files
 ( 50Gb) simultaneously by seeking to N locations. Process each
 separate chunk and merge the output. Very similar to MapReduce
 concept.

 What I want to know is the best way to read a file concurrently. I
 have read about file-handle.seek(),  os.lseek() but not sure if thats
 the way to go. Any used cases would be of help.

 Your idea won't work.  Reading from disk is not a CPU-bound process,
 it's an I/O bound process.  Meaning, the speed by which you can read
 from a conventional mechanical hard disk drive is not constrained by
 how fast your CPU is, but generally by how fast your disk(s) can read
 data from the disk surface, which is limited by the rotation speed and
 areal density of the data on the disk (and the seek time), and by how
 fast it can shovel the data down it's I/O bus.  And *that* speed is
 still orders of magnitude slower than your RAM and your CPU.  So, in
 reality even just one of your cores will spend the vast majority of
 its time waiting for the disk when reading your 50GB file.  There's
 therefore __no__ way to make your file reading faster by increasing
 your __CPU cores__ -- the only way is by improving your disk I/O
 throughput.  You can for example stripe several hard disks together in
 RAID0 (but that increases the risk of data loss due to data being
 spread over multiple drives) and/or ensure you use a faster I/O
 subsystem (move to SATA3 if you're currently using SATA2 for example),
 and/or use faster hard disks (use 10,000 or 15,000 RPM instead of
 7,200, or switch to SSD [solid state] disks.)  Most of these options
 will cost you a fair bit of money though, so consider these thoughts
 in that light.

 Walter
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] plotting several datasets and calling data from afar

2012-03-26 Thread Elaina Ann Hyde
On Mon, Mar 26, 2012 at 11:14 PM, Evert Rol evert@gmail.com wrote:

  Hi Elaina,


  Hi everyone,
 I am trying to set up a code to do some plotting and before I get too
 far I wanted to ask some structure questions.  Basically I want to tell
 python to read 2 datasets, plot them on the same scale on the same x-y axis
 , read a third dataset and match the name from the first dataset, then
 label certain values from the third... complicating matters is that all
 these data are part of much, much larger sets in seperate files, the paths
 look like:
  pathway1/namered.dat
  pathway2/nameblue.dat
  matchingfile.txt
 
  so I do fopen on the matchingfile, read it with asciitable, and then I
 have a column in that file called 'name' and a column called 'M', I sort
 the file, return a subset that is interesting, and get name1, name2, etc
 for every subset.  I want to make a plot that looks like:
 
  plot pathway1/namered.dat and pathway2/nameblue.dat with label 'M' for
 every value in the subset name1, each row[i] I need to assign to a seperate
 window so that I get a multiplot with a shared x-axis, and stacking my
 plots up the y-axis.  I do have multiplot working and I know how to plot
 'M' for each subset.
 
  The conceptual trouble has come in, how do I match 'name' variable of my
 subset 'name1' with the plot I want to do for pathway1/namered.dat and
 pathway2/nameblue.dat... the key feature that is the same is the 'name'
 variable, but in one instance I have to match the 'name'+'red.dat' and in
 the other the 'name'+'blue.dat'

 It's not 100% clear to me what you precisely want to do, but here are a
 few possibilites:

 - use a dictionary. Assign each dataset to a dictionary with the name as
 the key (or the name + color). Eg, dataset['name1red'],
 dataset['name1blue'], dataset['name2red'] etc. Each value in this
 dictionary is a dataset read by asciitable.
  the (big) disadvantage is that you would read every single *.dat file
 beforehand
 - simply fopen the files with the filename deduced from the name. So, in a
 loop, that would be something like
   for name in subset_names:
  with fopen(name + 'red.dat') as datafile:
  # read data and plot
  with fopen(name + 'blue.dat') as datafile:
  # read data and plot

 But perhaps I misunderstand your problem. I can't really tell if the
 problem is in opening the selected data files, or plotting them, or
 creating the labels for the plot.
 This may actually be where some code comes in handy. Provided the plotting
 isn't the problem, you could make a very simple Python script that shows
 the concept and how you attempt to solve it. The simpler the script, the
 better probably the implementation, so even make such a simple script is
 incredibly useful. (Any example data sets of, say, just 2 lines each can
 also help with that.)
 From that, the problem may become clearer and we can help you improving
 the script.

 Of course, someone else may actually understand the problem properly and
 has a better suggestion.

 Cheers,

  Evert

 (yes, small world ;-)


  Any ideas would be appreciated, thanks!
  ~Elaina Hyde
 
  --
  PhD Candidate
  Department of Physics and Astronomy
  Faculty of Science
  Macquarie University
  North Ryde, NSW 2109, Australia
  ___
  Tutor maillist  -  Tutor@python.org
  To unsubscribe or change subscription options:
  http://mail.python.org/mailman/listinfo/tutor

 -



Thanks Evert and Tino,
   The dictionaries look a bit like the right idea, but in the end I was
able to manipulate my input table so they aren't quite necessary.  The code
I have now, which partially works, is as follows:

#!/usr/bin/python

# import modules used here
import sys
import asciitable
import matplotlib
import matplotlib.path as mpath
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure, show, axis
from matplotlib.patches import Ellipse
import scipy
import numpy as np
from numpy import *
import math
import pylab
import random
from pylab import *
import astropysics
import astropysics.obstools
import astropysics.coords
import string
from astropysics.coords import ICRSCoordinates,GalacticCoordinates

#File from Read_All
x=open('LowZ_joinAll')

dat=asciitable.read(x,Reader=asciitable.NoHeader,
fill_values=['--','-999.99'])
#gives dat file where filenames are first two columns

###
bluefilename1=dat['col1']
filename1=dat['col2']

#other stuff I need

#Ra/Dec in decimal radians
Radeg1=dat[ 'col6']*180./math.pi   #ra-drad
Decdeg1=dat['col7']*180./math.pi#dec-drad
Vmag=dat['col8']
Mag=dat['col15']
EW1=dat['col16']
EW2=dat['col17']
EW3=dat['col18']
#Horizontal Branch Estimate
VHB=18.0
EWn = (0.5*abs(EW1)) + abs(EW2) + (0.6*abs(EW3))
# NEED ABS VALUE FOR FORMULA
FEHn = -2.66 + 0.42*(EWn + 0.64*(Vmag - VHB))
EW1_G=dat['col23']
EW2_G=dat['col24']
EW3_G=dat['col25']
EWg = (0.5*abs(EW1_G)) + 

[Tutor] (no subject)

2012-03-26 Thread thao nguyen
Dear Support Team,

I have built a function (enclosed here) to merge many files (in this
example is 2 files: a1.txt and a2.txt) lines by lines. The output file
is called final_file. However, i could not have it run successfully.

Content of a1.txt:
1
3
5


Content of a2.txt:
2
4
6


Content of final_file.txt will be like:
1
2
3
4
5
6


In Python, i called just written module:

import argument
reload(argument)
argument.test(2,C:/a1.txt,C:/a2.txt)

and get the error as below:
ValueError: I/O operation on closed file
 File c:\append.py, line 5, in module
 argument.test(2,C:/a1.txt,C:/a2.txt)
 File c:\argument.py, line 28, in test
for line_data in f:

Could you please advise the resolution for this?


Thank you


append.py
Description: Binary data
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Permissions Error

2012-03-26 Thread Michael Lewis

 Message: 1
 Date: Mon, 26 Mar 2012 10:52:19 +1100
 From: Steven D'Aprano st...@pearwood.info
 To: tutor@python.org
 Subject: Re: [Tutor] Permissions Error
 Message-ID: 4f6fafb3.4060...@pearwood.info
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed

 Michael Lewis wrote:
  Hi everyone,
 
  If I've created a folder, why would I receive a permissions error when
  trying to copy the file. My source code is here:
  http://pastebin.com/1iX7pGDw

 The usual answer to why would I receive a permissions error is that you
 don't actually have permission to access the file.

 What is the actual error you get?


Traceback (most recent call last):
  File C:\Python27\Utilities\copyfiles.py, line 47, in module
copyfiles(srcdir, dstdir)
  File C:\Python27\Utilities\copyfiles.py, line 42, in copyfiles
shutil.copy(srcfile, dstfile)
  File C:\Python27\lib\shutil.py, line 116, in copy
copyfile(src, dst)
  File C:\Python27\lib\shutil.py, line 81, in copyfile
with open(src, 'rb') as fsrc:
IOError: [Errno 13] Permission denied: 'C:\\Users\\Chief
Ninja\\Pictures\\testdir'

I've noticed that the code runs if I use shutil.copyfiles instead of
shutil.copy. Do you know why?

I've also noticed that if I have a sub-directory, I receive a permission
error. However, if I use os.walk, then my code runs if I have a
sub-directory in my source directory. My problem then becomes that os.walk
doesn't actually move the directory, but instead just moves the files
within the sub-directory. Oddly, this code runs without permission error
when I use shutil.copy unlike the above piece which raises an error when I
use shutil.copy. However, if I use shutil.copyfile, I get the below error:
(Do you know why?)

Traceback (most recent call last):
  File C:/Python27/Homework/oswalk.py, line 41, in module
copyfiles(srcdir, dstdir)
  File C:/Python27/Homework/oswalk.py, line 36, in copyfiles
shutil.copyfile(srcfile, dstdir)
  File C:\Python27\lib\shutil.py, line 82, in copyfile
with open(dst, 'wb') as fdst:
IOError: [Errno 13] Permission denied: 'C:\\newtest'

The code where I use os.walk is here:
http://pastebin.com/1hchnmM1



  When I check the properties/security of the file in question, the system
  says I have full control.

 This does not sound like a Python problem, but an operating system problem.
 What OS are you using?


I am using Windows XP


 You should check the permissions on the folder, not just the file. Also, if
 you OS supports it, check any extended permissions and ACLs that might
 apply.
 Can you copy the file using another language, e.g. using powershell, bash
 or
 applescript? Also check that you are running the Python script as the same
 user you used when creating the file.


I've checked the permissions on the folder and I have full control. I
haven't yet checked extended permissions and ACL's etc, but will do so.

Thanks!




 --
 Steven

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor