Bugs item #1744752, was opened at 2007-06-28 13:23
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1744752&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Rune Devik (runedevik)
Assigned to: Nobody/Anonymous (nobody)
Summary: Newline skipped in "for line in file"

Initial Comment:
Creating new ticket for the bug described here since it was closed (and I was 
not able to reopen it): 
http://sourceforge.net/tracker/index.php?func=detail&aid=1636950&group_id=5470&atid=105470

The problem is that when you open a hughe file on windows with the "r" mode it 
will sometimes merge two lines. As I said in the ticket above (but probably 
ignored since I updated a closed ticket):

Hi

I have the same problem with a huge file (8GB) containing long lines. Sometimes 
two lines are merged into one and rerunning the test script that reads the file 
it's always the same lines that are merged. Also the merging happens more 
frequently towards the end of the file it seems. I tried to reproduce with a 
smaller data set (10 lines before the two lines that get merged, the two lines 
that gets merged and the 10 lines after that) but I was not able to reproduce 
on this smaller data set. However if you open this huge file in "rb" mode 
instead of "r" mode everything works as it should and no lines are merged at 
all! If I copy the file over to linux and rerun the test script no lines are 
merged (regardless if mode is "r" or "rb") so this is windows specific and 
might have something todo with the adding of \r\n if only \n is found when you 
open the file in "r" mode maybe? Also I have reproduced it on both python 2.3.5 
and 2.5c1 on both windows XP and windows 2003. 

More stats on the input file in both "r" mode and "rb" mode below:

Input file size: 8 695 828 KB

fp = open(file, "r"):
  - total number of lines read:  668909
  - length of the longest line:  13179792
  - length of the shortest line: 89
  - 56 lines contains the content of two lines
  - Always just two lines that are merged into one! 
  - Always the same lines that are merged rerunning the test on the same file. 

open(file, "rb"):
  - total number of lines read:  668965
  - length of the longest line:  13179793
  - length of the shortest line: 90
  - no lines merged

Regards,
Rune Devik

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1744752&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to