ftell, fgetpos, etc.

2007-12-19 Thread Hendrik Boom
I need to write code that creates, reads, and writes a random-access binary
file, said binary file to be readable and writable on several machines,
which may have different byte sex, but will certainly have different
native word size (32 vs 64 bit).  Addresses of positions in the file
*will* have to be written into the file.

The machines on which this software will have to run presently use
Debian or Debian-derived Linux distributions. (386, AMD64, maemo).

Now I know how to handle different byte sex (use shifts and masks to
decompose data and recompose it in the chosen file-format -- anyone have a
metter method?).

What I don't know is how to seek around the file in a machine-independent
manner, and avoid future headaches.

I can certainly hack up something that works for now, and will have to be
replaced if the files to be handled ever get huge.  But I'd like to know
if there's a recommended way of doing it.

As far as I can tell, the two regimes available are

(a) use fgetpos and fsetpos
  This will presumably do random access to anything the machine's file
  system will handle, but the disk address I get from fgetpos are
  unliky to be usable on another system.
(b) use ftell and fseek
  Now these will solve the problem as long as my files stay small.
  They provide byte counts from the start of the file, which are
  semantically independent of the platform, but are just long int, which,
  last I heard, was 32 bits almost everywhere (and, because of the sign
  bit are limited to 31 bits in practise).

Is there something else available?  Is there another way to use the tools
I have already mentioned?  Is there a clean way to move to 64-bit
relatively system-independent disk addresses?  Is there a standard way?
 
-- hendrik


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: ftell, fgetpos, etc.

2007-12-19 Thread Douglas A. Tutty
On Wed, Dec 19, 2007 at 06:09:54PM +, Hendrik Boom wrote:
> I need to write code that creates, reads, and writes a random-access binary
> file, said binary file to be readable and writable on several machines,
> which may have different byte sex, but will certainly have different
> native word size (32 vs 64 bit).  Addresses of positions in the file
> *will* have to be written into the file.
> 
> The machines on which this software will have to run presently use
> Debian or Debian-derived Linux distributions. (386, AMD64, maemo).
> 
> Now I know how to handle different byte sex (use shifts and masks to
> decompose data and recompose it in the chosen file-format -- anyone have a
> metter method?).
> 
> What I don't know is how to seek around the file in a machine-independent
> manner, and avoid future headaches.
> 
> I can certainly hack up something that works for now, and will have to be
> replaced if the files to be handled ever get huge.  But I'd like to know
> if there's a recommended way of doing it.
> 
> As far as I can tell, the two regimes available are
> 
> (a) use fgetpos and fsetpos
>   This will presumably do random access to anything the machine's file
>   system will handle, but the disk address I get from fgetpos are
>   unliky to be usable on another system.
> (b) use ftell and fseek
>   Now these will solve the problem as long as my files stay small.
>   They provide byte counts from the start of the file, which are
>   semantically independent of the platform, but are just long int, which,
>   last I heard, was 32 bits almost everywhere (and, because of the sign
>   bit are limited to 31 bits in practise).
> 
> Is there something else available?  Is there another way to use the tools
> I have already mentioned?  Is there a clean way to move to 64-bit
> relatively system-independent disk addresses?  Is there a standard way?
>  

To me, a huge file is one which is too big to just load into memory
to facilitate the random access.  To do random access on a huge file,
the speed limit will be the drive access rather than any algorithm you
choose, or language for that matter.  

To be machine independant yet have a pointer always longer than 32 bits,
you'll have to write or import a mult-integer data type so that, for
example, if you decide that you need a 128-bit integer (for future
growth), then you have a function that handles them, then the file seek
sections take that to work on, using your imported library to do any
math required.

However, for current OSs, I think the filesystem is limited to 64 bits
for both 32-bit and 64-bit versions (at least in linux for ext2/3).  

In any event, this is trivial to set up with a language that is more
machine independant than standard C.  If I were you, I'd prototype it in
Python and if it wasn't computationally fast enough, I'd re-implement it
in Ada.

My answer is vague since your info is a bit vague.  What is the purpose
of this and what are the parameters.

Doug.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: ftell, fgetpos, etc.

2007-12-19 Thread Alex Samad
On Wed, Dec 19, 2007 at 11:39:58PM -0500, Douglas A. Tutty wrote:
> On Wed, Dec 19, 2007 at 06:09:54PM +, Hendrik Boom wrote:
> > I need to write code that creates, reads, and writes a random-access binary
> > file, said binary file to be readable and writable on several machines,
> > which may have different byte sex, but will certainly have different
> > native word size (32 vs 64 bit).  Addresses of positions in the file
> > *will* have to be written into the file.
> > 
> > The machines on which this software will have to run presently use
> > Debian or Debian-derived Linux distributions. (386, AMD64, maemo).
> > 
> > Now I know how to handle different byte sex (use shifts and masks to
> > decompose data and recompose it in the chosen file-format -- anyone have a
> > metter method?).
> > 
> > What I don't know is how to seek around the file in a machine-independent
> > manner, and avoid future headaches.
> > 
> > I can certainly hack up something that works for now, and will have to be
> > replaced if the files to be handled ever get huge.  But I'd like to know
> > if there's a recommended way of doing it.
> > 
> > As far as I can tell, the two regimes available are
> > 
> > (a) use fgetpos and fsetpos
> >   This will presumably do random access to anything the machine's file
> >   system will handle, but the disk address I get from fgetpos are
> >   unliky to be usable on another system.
> > (b) use ftell and fseek
> >   Now these will solve the problem as long as my files stay small.
> >   They provide byte counts from the start of the file, which are
> >   semantically independent of the platform, but are just long int, which,
> >   last I heard, was 32 bits almost everywhere (and, because of the sign
> >   bit are limited to 31 bits in practise).
> > 
> > Is there something else available?  Is there another way to use the tools
> > I have already mentioned?  Is there a clean way to move to 64-bit
> > relatively system-independent disk addresses?  Is there a standard way?
not knowing the full requirements for this, but why not create a server app 
that sits on a amd64 machine and create clients that can be on any machine, 
then define the protocol and transfer info via tcp/udp

with multi machines access the same file you are going to have contention 
problems and concurrency problems as well.

> >  
> 
> To me, a huge file is one which is too big to just load into memory
> to facilitate the random access.  To do random access on a huge file,
> the speed limit will be the drive access rather than any algorithm you
> choose, or language for that matter.  
> 
> To be machine independant yet have a pointer always longer than 32 bits,
> you'll have to write or import a mult-integer data type so that, for
> example, if you decide that you need a 128-bit integer (for future
> growth), then you have a function that handles them, then the file seek
> sections take that to work on, using your imported library to do any
> math required.
> 
> However, for current OSs, I think the filesystem is limited to 64 bits
> for both 32-bit and 64-bit versions (at least in linux for ext2/3).  
> 
> In any event, this is trivial to set up with a language that is more
> machine independant than standard C.  If I were you, I'd prototype it in
> Python and if it wasn't computationally fast enough, I'd re-implement it
> in Ada.
> 
> My answer is vague since your info is a bit vague.  What is the purpose
> of this and what are the parameters.
> 
> Doug.
> 
> 
> 
> -- 
> To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
> with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
> 
> 


signature.asc
Description: Digital signature


Re: ftell, fgetpos, etc.

2007-12-21 Thread Hendrik Boom
On Thu, 20 Dec 2007 15:48:34 +1100, Alex Samad wrote:

> On Wed, Dec 19, 2007 at 11:39:58PM -0500, Douglas A. Tutty wrote:
>> On Wed, Dec 19, 2007 at 06:09:54PM +, Hendrik Boom wrote:
>> > I need to write code that creates, reads, and writes a random-access binary
>> > file, said binary file to be readable and writable on several machines,
>> > which may have different byte sex, but will certainly have different
>> > native word size (32 vs 64 bit).  Addresses of positions in the file
>> > *will* have to be written into the file.
>> > 
>> > The machines on which this software will have to run presently use
>> > Debian or Debian-derived Linux distributions. (386, AMD64, maemo).
>> > 
>> > Now I know how to handle different byte sex (use shifts and masks to
>> > decompose data and recompose it in the chosen file-format -- anyone have a
>> > metter method?).
>> > 
>> > What I don't know is how to seek around the file in a machine-independent
>> > manner, and avoid future headaches.
>> > 
>> > I can certainly hack up something that works for now, and will have to be
>> > replaced if the files to be handled ever get huge.  But I'd like to know
>> > if there's a recommended way of doing it.
>> > 
>> > As far as I can tell, the two regimes available are
>> > 
>> > (a) use fgetpos and fsetpos
>> >   This will presumably do random access to anything the machine's file
>> >   system will handle, but the disk address I get from fgetpos are
>> >   unliky to be usable on another system.
>> > (b) use ftell and fseek
>> >   Now these will solve the problem as long as my files stay small.
>> >   They provide byte counts from the start of the file, which are
>> >   semantically independent of the platform, but are just long int, which,
>> >   last I heard, was 32 bits almost everywhere (and, because of the sign
>> >   bit are limited to 31 bits in practise).
>> > 
>> > Is there something else available?  Is there another way to use the tools
>> > I have already mentioned?  Is there a clean way to move to 64-bit
>> > relatively system-independent disk addresses?  Is there a standard way?
> not knowing the full requirements for this, but why not create a server app 
> that sits on a amd64 machine and create clients that can be on any machine, 
> then define the protocol and transfer info via tcp/udp
> 
> with multi machines access the same file you are going to have contention 
> problems and concurrency problems as well.
> 

The problem I'm asking about is a lot more low-level than that.  I'm
interested in what standard (or at least common) library functions I can
use that behave transparently in 32- or 64-bit mode depending on the
machine they are compiled and run on.  Well, I suppose 64-bit mode is OK
on 32-bit machines, since long long is pretty widely implemented nowadays,
and the length of disk addresses is more important than the length of
efficient integers.

If necessary I'll use #ifdef to choose code depending on the platform, but
that's ugly if the masters of C and Unix have defined a neat way to do it.

-- hendrik

P.S.  I am planning to implement protocols on top of this stuff -- sort of
a mix between distributed revision control and data bases.  Disconnected
operation with later sync will be essential.  But that's not what I'm
asking about here.

>> >  
>> 
>> To me, a huge file is one which is too big to just load into memory
>> to facilitate the random access.  To do random access on a huge file,
>> the speed limit will be the drive access rather than any algorithm you
>> choose, or language for that matter.  
>> 
>> To be machine independant yet have a pointer always longer than 32 bits,
>> you'll have to write or import a mult-integer data type so that, for
>> example, if you decide that you need a 128-bit integer (for future
>> growth), then you have a function that handles them, then the file seek
>> sections take that to work on, using your imported library to do any
>> math required.
>> 
>> However, for current OSs, I think the filesystem is limited to 64 bits
>> for both 32-bit and 64-bit versions (at least in linux for ext2/3).  
>> 
>> In any event, this is trivial to set up with a language that is more
>> machine independant than standard C.  If I were you, I'd prototype it in
>> Python and if it wasn't computationally fast enough, I'd re-implement it
>> in Ada.
>> 
>> My answer is vague since your info is a bit vague.  What is the purpose
>> of this and what are the parameters.
>> 
>> Doug.
>> 
>> 
>> 
>> -- 
>> To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
>> with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
>> 
>>


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: ftell, fgetpos, etc.

2007-12-21 Thread Scott Gifford
Hendrik Boom <[EMAIL PROTECTED]> writes:

[...]

> What I don't know is how to seek around the file in a machine-independent
> manner, and avoid future headaches.

[...]

> (a) use fgetpos and fsetpos
>   This will presumably do random access to anything the machine's file
>   system will handle, but the disk address I get from fgetpos are
>   unliky to be usable on another system.
> (b) use ftell and fseek
>   Now these will solve the problem as long as my files stay small.
>   They provide byte counts from the start of the file, which are
>   semantically independent of the platform, but are just long int, which,
>   last I heard, was 32 bits almost everywhere (and, because of the sign
>   bit are limited to 31 bits in practise).
>
> Is there something else available?  Is there another way to use the tools
> I have already mentioned?  Is there a clean way to move to 64-bit
> relatively system-independent disk addresses?  Is there a standard way?

The very low-level function you are looking for is lseek or lseek64.
That won't play well with the stdio library, unfortunately, but you
may find fseeko() and ftello() will work.  They use a file position
that is of type "off_t", which is usually be 64-bit on newer Unixes
(or at least can be with the right compilation options).  See:

http://www.unix.org/version2/whatsnew/lfs20mar.html

and the appropriate manpages, and I think that will set you on the
right track.

Good luck!

Scott.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: ftell, fgetpos, etc.

2007-12-23 Thread hendrik
On Sat, Dec 22, 2007 at 01:16:48AM -0500, Scott Gifford wrote:
> Hendrik Boom <[EMAIL PROTECTED]> writes:
> >
> > Is there something else available?  Is there another way to use the tools
> > I have already mentioned?  Is there a clean way to move to 64-bit
> > relatively system-independent disk addresses?  Is there a standard way?
> 
> The very low-level function you are looking for is lseek or lseek64.
> That won't play well with the stdio library, unfortunately, but you
> may find fseeko() and ftello() will work.  They use a file position
> that is of type "off_t", which is usually be 64-bit on newer Unixes
> (or at least can be with the right compilation options).  See:
> 
> http://www.unix.org/version2/whatsnew/lfs20mar.html
> 
> and the appropriate manpages, and I think that will set you on the
> right track.


Thanks.  That looks like the information I'm looking for.

-- hendrik


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]