Re: Noob question: proper way to read binary files byte by byte

2019-02-27 Thread cblake
A short reply like this may be inadequate to explain virtual memory mechanisms 
if you have never heard of them before. That said, if you have heard in the 
past and forgotten this may help.

The `newMemMapFileStream` will call `memfiles.open` with default flags. Default 
flags typically just lookup the size of the file and create an address range in 
your process that -- when page faulted by the virtual memory hardware -- will 
(transparently to your process inside the OS kernels "page fault handler") 
cause loading/population of 4k (or possibly larger) "pages" of memory, 
on-demand with file contents for the corresponding spot. This is all fairly 
portable behavior.

So, in light of that, at the beginning of your loop, nothing will be "loaded". 
By the end of the loop, as much will be loaded as can fit in the RAM of your 
machine. The actual fact of the matter of "being loaded" depends upon the 
sometimes highly dynamic competition for physical memory among all the programs 
on a system. This is generally also true of any buffered IO mechanism when 
"swap files" or "page files" or partitions are enabled. Each page will be 
loaded for a little while or your program cannot make progress, but by the time 
you get to the end of your loop if the file is gigantic/larger than RAM or if 
some other process is demanding a lot of RAM then the beginning may no longer 
be "resident" in RAM.

Certain operating systems allow you to "tune" the on-demand loading behavior 
with "flag" arguments to the API that sets up this "auto-loading" mechanism. 
For example, Linux allows you to specify `MAP_POPULATE` which will, in effect, 
pre-load the whole file into RAM **before** your program loop/without your 
program making the CPU dereference any of those file data addresses. You may 
want to do this for example if the persistent backing store is a magnetic 
spinning disk, the file is small and yo want to avoid "seeking" the disk head 
around. Similarly, on Unix, there are also the `madvise/posix_madvise` 
interfaces which lets a program advise the OS that memory accesses are likely 
to be sequential (your case) or random, or even specify certain ranges as 
candidates for preloading. These little tweaks tend to be very non-portable, 
though, and the default behavior probably does what you want.

If it does not do what you want, `MemMapFileStream` does not (presently) 
support adding "flags" to the OS mapping calls. I did recently improve the 
`memfiles.open` interface to allow just that. You might like the non-stream API 
better anyway. You can `cast[ptr UncheckedArray[char]]` the `MemFile.pointer` 
and just use the file as an array of bytes if you like. You do have to be 
careful not to overrun the end of the file. And another recent addition I got 
in was to allow `toOpenArray(cast[ptr UncheckedArray[char]](ThePointer), 0, 
TheFileSize-1)` style passing of such arguments to Nim procs expecting 
OpenArray[char] parameters.


Re: Noob question: proper way to read binary files byte by byte

2019-02-27 Thread federico3
No, memfiles use the memory mapping mechanism provided by the OS (e.g. mmap). 
[https://nim-lang.org/docs/memfiles.html](https://nim-lang.org/docs/memfiles.html)


Re: Noob question: proper way to read binary files byte by byte

2019-02-27 Thread mashingan
memfile is basically loading all content to memory, cmiiw


Re: Noob question: proper way to read binary files byte by byte

2019-02-26 Thread vimal73700
Thank you for your wonderful suggestions. I finally ended by using memfiles as 
suggested by @cblake which improved (reduced) the runtime significantly, but if 
I may ask, does this (below code) load the entire file content into memory?


var fs = newMemMapFileStream(paramStr(1), fmRead)
while not fs.atEnd:
  echo fs.readChar()


Run


Re: Noob question: proper way to read binary files byte by byte

2019-02-24 Thread cblake
You can also use `memfiles`. There writing/reading is the same as accessing 
memory. Besides being possibly simpler presenting an "as if you already read 
the whole file into a buffer" view, it may also be much more efficient, 
especially for byte-at-a-time operation where other APIs might do a lot of 
behind the scenes work on a per-IO basis. Of course, to be usable as a 
`MemFile`, the data needs to be random access (e.g. on the disk as opposed to a 
network socket or pipe or some other unseekable input).


Re: Noob question: proper way to read binary files byte by byte

2019-02-24 Thread r3c
[check this 
out](https://bitbucket.org/DraganJanushevski/q3bsp/src/cee3bf04b30414672939c0ffde25011ec026a822/src/coreBSP/bspfile.nim#lines-18)


Re: Noob question: proper way to read binary files byte by byte

2019-02-24 Thread mashingan
use [atEnd](https://nim-lang.org/docs/streams.html#atEnd%2CStream) to check 
whether it's ended or not and use 
[getPosition](https://nim-lang.org/docs/streams.html#getPosition%2CStream) for 
its current position.


import os, streams

var fs = newFileStream(paramStr(1), fmRead)

while not fs.atEnd:
  var one_char = fs.readChar()
  echo one_char


Run


Noob question: proper way to read binary files byte by byte

2019-02-24 Thread vimal73700
Hi all,

What is the best way to read a binary file byte by byte?


import os, streams

var fs_pos = 0
var fs = newFileStream(paramStr(1), fmRead)

while true:
  var one_char = fs.readChar()
  echo one_char
  if (one_char == '\0'):
echo "breaking at " & $fs_pos
break
  fs_pos += 1


Run

streams.readChar() returns the same '0' for a null byte as well as EOF. Please 
advise.