Re: how to do reading of binary files?

2007-06-08 Thread Diez B. Roggisch
jvdb schrieb:
 Hi all,
 
 I need some help on the following issue. I can't seem to solve it.
 
 I have a binary (pcl) file.
 In this file i want to search for specific codes (like 0C). I have
 tried to solve it by reading the file character by character, but this
 is very slow. Especially when it comes to files which are large
 (10MB) this is consuming quite some time.
 Does anyone has a hint/clue/solution on this?

What has the searching to do with the reading? 10MB easily fit into the 
main memory of a decent PC, so just do


contents = open(file).read() # yes I know I should close the file...

print contents.find('\x0c')

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to do reading of binary files?

2007-06-08 Thread jvdb
On 8 jun, 14:07, Diez B. Roggisch [EMAIL PROTECTED] wrote:
 jvdb schrieb:
..
 What has the searching to do with the reading? 10MB easily fit into the
 main memory of a decent PC, so just do

 contents = open(file).read() # yes I know I should close the file...

 print contents.find('\x0c')

 Diez

True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 0C. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.



-- 
http://mail.python.org/mailman/listinfo/python-list


how to do reading of binary files?

2007-06-08 Thread jvdb
Hi all,

I need some help on the following issue. I can't seem to solve it.

I have a binary (pcl) file.
In this file i want to search for specific codes (like 0C). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?

thanks already!

Jeroen

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to do reading of binary files?

2007-06-08 Thread jvdb
On 8 jun, 15:19, Marc 'BlackJack' Rintsch [EMAIL PROTECTED] wrote:
 In [EMAIL PROTECTED], Diez B. Roggisch wrote:



  jvdb schrieb:
  True. But there is another issue attached to the one i wrote.
  When i know how much this occurs, i know the amount of pages in the
  file. After that i would like to be able to extract a given amount of
  data:
  file x contains 20 0C. then for example i would like to extract from
  instance 5 to instance 12 from the file.
  The reason why i want to do this: The 0C stands for a pagebreak in PCL
  language. This way i would be absle to extract a certain amount of
  pages from the file.

  And? Finding the respective indices by using

  last_needle_position = 0
  positions = []
  while last_needle_position != -1:
   last_needle_position = contents.find(needle, last_needle_position+1)
   if last_needle_position != -1:
   positions.append(last_needle_position)

  will find all the pagepbreaks. then just slice contents appropriatly.
  Did you read the python tutorial?

 Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
 them again is enough, depending of the size of the files and memory of
 course.

 One problem I see is that '\x0c' may not always be the page end.  It may
 occur in rastered image data too I guess.

 Ciao,
 Marc 'BlackJack' Rintsch

Hi,

your last comment is also something i have noticed. There are a number
of occasions where this will happen. I also have to deal with this.
I will dive into this on monday, after this hot weekend.

cheers,
Jeroen

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to do reading of binary files?

2007-06-08 Thread Diez B. Roggisch
jvdb schrieb:
 On 8 jun, 14:07, Diez B. Roggisch [EMAIL PROTECTED] wrote:
 jvdb schrieb:
 ..
 What has the searching to do with the reading? 10MB easily fit into the
 main memory of a decent PC, so just do

 contents = open(file).read() # yes I know I should close the file...

 print contents.find('\x0c')

 Diez
 
 True. But there is another issue attached to the one i wrote.
 When i know how much this occurs, i know the amount of pages in the
 file. After that i would like to be able to extract a given amount of
 data:
 file x contains 20 0C. then for example i would like to extract from
 instance 5 to instance 12 from the file.
 The reason why i want to do this: The 0C stands for a pagebreak in PCL
 language. This way i would be absle to extract a certain amount of
 pages from the file.

And? Finding the respective indices by using

last_needle_position = 0
positions = []
while last_needle_position != -1:
 last_needle_position = contents.find(needle, last_needle_position+1)
 if last_needle_position != -1:
 positions.append(last_needle_position)


will find all the pagepbreaks. then just slice contents appropriatly. 
Did you read the python tutorial?

diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to do reading of binary files?

2007-06-08 Thread Marc 'BlackJack' Rintsch
In [EMAIL PROTECTED], Diez B. Roggisch wrote:

 jvdb schrieb:
 True. But there is another issue attached to the one i wrote.
 When i know how much this occurs, i know the amount of pages in the
 file. After that i would like to be able to extract a given amount of
 data:
 file x contains 20 0C. then for example i would like to extract from
 instance 5 to instance 12 from the file.
 The reason why i want to do this: The 0C stands for a pagebreak in PCL
 language. This way i would be absle to extract a certain amount of
 pages from the file.
 
 And? Finding the respective indices by using
 
 last_needle_position = 0
 positions = []
 while last_needle_position != -1:
  last_needle_position = contents.find(needle, last_needle_position+1)
  if last_needle_position != -1:
  positions.append(last_needle_position)
 
 
 will find all the pagepbreaks. then just slice contents appropriatly. 
 Did you read the python tutorial?

Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
them again is enough, depending of the size of the files and memory of
course.

One problem I see is that '\x0c' may not always be the page end.  It may
occur in rastered image data too I guess.

Ciao,
Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to do reading of binary files?

2007-06-08 Thread Grant Edwards
On 2007-06-08, jvdb [EMAIL PROTECTED] wrote:

 I have a binary (pcl) file.
 In this file i want to search for specific codes (like 0C). I have
 tried to solve it by reading the file character by character, but this
 is very slow. Especially when it comes to files which are large
 (10MB) this is consuming quite some time.
 Does anyone has a hint/clue/solution on this?

I'd memmap the file.

http://docs.python.org/lib/module-mmap.html

If you prefer it to appear as an array of bytes instead of a
string, the various numeric/array packags can do that.

Numarray:  
http://stsdas.stsci.edu/numarray/numarray-1.5.html/module-numarray.memmap.html
Vmaps: http://snafu.freedom.org/Vmaps/Vmaps.html
Numpy: documentation is not free

Since I can't point you to Numpy docs, here's a link to a
newsgroup thread with an example for numpy:

http://groups.google.com/group/comp.lang.python/browse_frm/thread/c63c3e281df99897/2336baa98386d5e7

-- 
Grant Edwards   grante Yow! I like your SNOOPY
  at   POSTER!!
   visi.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to do reading of binary files?

2007-06-08 Thread Roger Miller
On Jun 8, 2:07 am, Diez B. Roggisch [EMAIL PROTECTED] wrote:

 ...

 What has the searching to do with the reading? 10MB easily fit into the
 main memory of a decent PC, so just do

 contents = open(file).read() # yes I know I should close the file...

 print contents.find('\x0c')

 Diez

Better make that 'open(file, rb).

-- 
http://mail.python.org/mailman/listinfo/python-list