Re: [Tutor] just what does read() return?

2010-09-30 Thread Alex Hall
On 9/30/10, Steven D'Aprano  wrote:
> On Fri, 1 Oct 2010 08:32:40 am Alex Hall wrote:
>
>> I fully expected to see txt be an array of strings since I figured
>> self.original would have been split on one or more new lines. It
>> turns out, though, that I get this instead:
>> ['l\nvx vy z\nvx vy z']
>
> There's no need to call str() on something that already is a string.
> Admittedly it doesn't do much harm, but it is confusing for the person
> reading, who may be fooled into thinking that perhaps the argument
> wasn't a string in the first place.
Agreed. I was having some (unrelated) trouble and was desperate enough
to start forcing things to the data type I needed, just in case.
>
> The string split method doesn't interpret its argument as a regular
> expression. r'\n+' has no special meaning here. It's just three literal
> characters backslash, the letter n, and the plus sign. split() tries to
> split on that substring, and since your data doesn't include that
> combination anywhere, returns a list containing a single item:
>
 "abcde".split("ZZZ")
> ['abcde']
Yes, that makes sense.
>
>> How is it that txt is not an array of the lines in the file, but
>> instead still holds \n characters? I thought the manual said read()
>> returns a string:
>
> It does return a string. It is a string including the newline
> characters.
>
>
> [...]
>> I know I can use f.readline(), and I was doing that before and it all
>> worked fine. However, I saw that I was reading the file twice and, in
>> the interest of good practice if I ever have this sort of project
>> with a huge file, I thought I would try to be more efficient and read
>> it once.
>
> You think that keeping a huge file in memory *all the time* is more
> efficient?
Ah, I see what you mean now. I work with the data later, so you are
saying that it would be better to just read the file as necessary,
then then, when I need the file's data later, just read it again.
> It's the other way around -- when dealing with *small* files
> you can afford to keep it in memory. When dealing with huge files, you
> need to re-write your program to deal with the file a piece at a time.
> (This is often a good strategy for small files as well, but it is
> essential for huge ones.)
>
> Of course, "small" and "huge" is relative to the technology of the day.
> I remember when 1MB was huge. These days, huge would mean gigabytes.
> Small would be anything under a few tens of megabytes.
>
>
> --
> Steven D'Aprano
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>


-- 
Have a great day,
Alex (msg sent from GMail website)
mehg...@gmail.com; http://www.facebook.com/mehgcap
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] just what does read() return?

2010-09-30 Thread Steven D'Aprano
On Fri, 1 Oct 2010 08:32:40 am Alex Hall wrote:

> I fully expected to see txt be an array of strings since I figured
> self.original would have been split on one or more new lines. It
> turns out, though, that I get this instead:
> ['l\nvx vy z\nvx vy z']

There's no need to call str() on something that already is a string. 
Admittedly it doesn't do much harm, but it is confusing for the person 
reading, who may be fooled into thinking that perhaps the argument 
wasn't a string in the first place.

The string split method doesn't interpret its argument as a regular 
expression. r'\n+' has no special meaning here. It's just three literal 
characters backslash, the letter n, and the plus sign. split() tries to 
split on that substring, and since your data doesn't include that 
combination anywhere, returns a list containing a single item:

>>> "abcde".split("ZZZ")
['abcde']



> How is it that txt is not an array of the lines in the file, but
> instead still holds \n characters? I thought the manual said read()
> returns a string:

It does return a string. It is a string including the newline 
characters.


[...]
> I know I can use f.readline(), and I was doing that before and it all
> worked fine. However, I saw that I was reading the file twice and, in
> the interest of good practice if I ever have this sort of project
> with a huge file, I thought I would try to be more efficient and read
> it once.

You think that keeping a huge file in memory *all the time* is more 
efficient? It's the other way around -- when dealing with *small* files 
you can afford to keep it in memory. When dealing with huge files, you 
need to re-write your program to deal with the file a piece at a time. 
(This is often a good strategy for small files as well, but it is 
essential for huge ones.)

Of course, "small" and "huge" is relative to the technology of the day. 
I remember when 1MB was huge. These days, huge would mean gigabytes. 
Small would be anything under a few tens of megabytes.


-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] just what does read() return?

2010-09-30 Thread Steven D'Aprano
On Fri, 1 Oct 2010 08:49:31 am Alex Hall wrote:

> Ah-ha!!
> re.split(r"\n+", self.original)
> That did it, and my program once again runs as expected. Thanks!

There is no need to crack that tiny peanut with the 40 lb sledgehammer 
of a regular expression.

list_of_lines = string.split('\n')

Much faster, simpler, and does the job. To get rid of empty lines:

list_of_lines = filter(None, string.split('\n'))




-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] just what does read() return?

2010-09-30 Thread Steve Willoughby

On 30-Sep-10 15:49, Alex Hall wrote:

re.split(r"\n+", self.original)
That did it, and my program once again runs as expected. Thanks!


If you don't need blank lines stripped out (r'\n+' considers multiple 
consecutive \n characters to be a single record separator), you can 
avoid the regex and just use the normal string split:


list_of_lines = giant_string.split('\n')
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] just what does read() return?

2010-09-30 Thread Alex Hall
On 9/30/10, Walter Prins  wrote:
> On 30 September 2010 23:32, Alex Hall  wrote:
>
>> txt=str(self.original).split(r"\n+") #create an array where elements
>>
>
> OK, consider this Python shell session:
>
 s = "line1\nline2"
 s.split()
> ['line1', 'line2']
 s.split(r"\n+")
> ['line1\nline2']
>
> Hmm, so split doesn't like that seperator.
>
> Taking a step back -- It looks like you're trying to specify a regular
> expression as a split string.  A string object's split method doesn't
> support regular expressions.  The split function in the "re" module however
> does.
Ah-ha!!
re.split(r"\n+", self.original)
That did it, and my program once again runs as expected. Thanks!
>
> HTH
Very much.
>
> Walter
>
-- 
Have a great day,
Alex (msg sent from GMail website)
mehg...@gmail.com; http://www.facebook.com/mehgcap
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] just what does read() return?

2010-09-30 Thread Alex Hall
Hi all,
I have a parser class which is supposed to take a text file and parse
it. I will then do more with the resulting data. The file is in a
particular format, specified by my professor, though this is not
homework (it will be used to do homework later). The file is in the
format:
l
vx vy z
vx vy z

where l is either D or U and x, y, and z are numbers. Anyway, I have
the following lines:
  f=open(self.file, "r")
  self.original=f.read() #I thought self.original would now be a
string of all the data in self.file
  txt=str(self.original).split(r"\n+") #create an array where elements
are lines in file
  print txt

I fully expected to see txt be an array of strings since I figured
self.original would have been split on one or more new lines. It turns
out, though, that I get this instead:
['l\nvx vy z\nvx vy z']

How is it that txt is not an array of the lines in the file, but
instead still holds \n characters? I thought the manual said read()
returns a string:

"To read a file's contents, call  f.read(size), which reads some
quantity of data and returns it as a string. size is an optional
numeric argument. When size is omitted or negative, the entire
contents of the file will be read and returned; it's your problem if
the file is twice as large as your machine's memory. Otherwise, at
most size bytes are read and returned. If the end of the file has been
reached, f.read()
 will return an empty string ( ""). "

I know I can use f.readline(), and I was doing that before and it all
worked fine. However, I saw that I was reading the file twice and, in
the interest of good practice if I ever have this sort of project with
a huge file, I thought I would try to be more efficient and read it
once. I will use self.original later again, so I need it either way,
and I figured I could use it since I had already read the file to get
it. TIA.



-- 
Have a great day,
Alex (msg sent from GMail website)
mehg...@gmail.com; http://www.facebook.com/mehgcap
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor