Re: [ql-users] efficient buffer size

2003-06-25 Thread Peter S Tillier

Robert Newson wrote:
[...]

 When testing out the various versions of grep (grep, egrep, fgrep) on
 Unix, I ran the timings at least twice - first to ensure that the
 command and data were in memory  cached and ignored that time, and
 then a second (or more) time(s) for the actual timing.  Including the
 first run would make the comparison skewed as the data for second
 program would be cached after testing first program, plus I preferred
 to do several timings and take the mean (the first timing would be
 larger due to the program not being cached). [If you're interested, I
 found egrep to be the fasted for the grepping I do.]

Perhaps because the egrep that you tried used a DFA RE engine, DFA
engines are generally slower to compile the REs, but faster in
operation, than the NFA engines used in many grep implementations (this
problem is especially noticeable with POSIX NFA engines because they
must try all possible matches).

Interestingly I've often found that fgrep is sometimes slower than
running egrep with the same parameters, which is odd because it is named
f(ast)grep, and was clearly intended to be used for pure string
searches.  In theory I suppose pure, i.e., non-RE, string searches
ought to be faster than RE searches, but clearly not by much in this
case:

$ time egrep susp .bash_profile
susp() ## print the POSIX man page for $1 to $1.posix
export -f l path substr sus wd awd susp

real0m1.487s
user0m0.000s
sys 0m0.000s

$ time fgrep susp .bash_profile
susp() ## print the POSIX man page for $1 to $1.posix
export -f l path substr sus wd awd susp

real0m1.269s
user0m0.000s
sys 0m0.000s

$

As you say a lot depends on the type of searching that you are doing and
the system that you are using.

--
Peter S Tillier
Who needs perl when you can write dc, sokoban,
arkanoid and an unlambda interpreter in sed?



Re: [ql-users] efficient buffer size

2003-06-25 Thread Peter S Tillier

P Witte wrote:
 Robert Newson writes:

 
   [If you're interested, I found egrep to be the fasted for the
 grepping I do.]

 I am interested. Was it you I corresponded with about a multi-file
 version of grep? The current versions for the QL will only allow one
 file at a time which makes it rather inefficient for scanning a whole
 directory or disk.

 Per

Really?  I'm surprised.  I posted a port of gsed2.05 that runs under
Adrian Ives' shell on the QL and from the Basic command line.  AFAICR it
accepts wildcard characters from either, using the wildcard expansion
code that Dave Walker's C68 provides where necessary.  I ported this
version because that supplied with C68 did not do all that I wanted.  I
also ported Brian Kernighan's awk because there was no port available
for the QL.

Peter S Tillier
Who needs perl when you can write dc, sokoban,
arkanoid and an unlambda interpreter in sed?



RE: [ql-users] efficient buffer size

2003-06-23 Thread Norman Dunbar

History indeed !

I used to use the 8K space between 8192 and 16384 as storage space. It was a
ROM shadow area, but never gave me any problems. I even managed to run
machine code programs from there as well.

Cheers,
Norman.

-
Norman Dunbar
Database/Unix administrator
Lynx Financial Systems Ltd.
mailto:[EMAIL PROTECTED]
Tel: 0113 289 6265
Fax: 0113 289 3146
URL: http://www.Lynx-FS.com
-


-Original Message-
From: Marcel Kilgus [mailto:[EMAIL PROTECTED]
Sent: Monday, June 23, 2003 12:50 PM
To: ql-users
Subject: Re: [ql-users] efficient buffer size



Norman Dunbar wrote:
 In my day it was a ZX-81 with 1KB of memory - every byte counted then !!

Despite my age this is where I started, too. But that's just history
now.

Marcel

This email is intended only for the use of the addressees named above and may be 
confidential or legally privileged. If you are not an addressee you must not read it 
and must not use any information contained in it, nor copy it, nor inform any person 
other than Lynx Financial Systems or the addressees of its existence or contents.  If 
you have received this email and are not a named addressee, please delete it and 
notify the Lynx Financial Systems IT Department on 0113 2892990.


Re: [ql-users] efficient buffer size

2003-06-23 Thread Marcel Kilgus

Lau wrote:
 Sound like fun. I guess it's a little warmer than home...

We have currently 38°C (100°F, 311°K) in southern Germany, it can't
possibly be much hotter than here ;-)

 Q. What's the longest monosyllabic word? (Clue: ryrira yrggref)

Has it something to do with a certain small wood creature (note that
wood is not capitalised here ;-))?
That's the only one I found that fits your clue.

Marcel



RE: [ql-users] efficient buffer size

2003-06-23 Thread Claude Mourier 00

I think yesterday we got 41°C in France ... 

-Message d'origine-
De : Marcel Kilgus [mailto:[EMAIL PROTECTED]
Envoyé : lundi 23 juin 2003 15:23
À : ql-users
Objet : Re: [ql-users] efficient buffer size



Lau wrote:
 Sound like fun. I guess it's a little warmer than home...

We have currently 38°C (100°F, 311°K) in southern Germany, it can't
possibly be much hotter than here ;-)



Re: [ql-users] efficient buffer size

2003-06-23 Thread Gerhard Plavec

False : every bit was precious :)

CU


 In my day it was a ZX-81 with 1KB of memory - every byte counted then !!

 :o)

 -
 Norman Dunbar
 Database/Unix administrator
 Lynx Financial Systems Ltd.
 mailto:[EMAIL PROTECTED]
 Tel: 0113 289 6265
 Fax: 0113 289 3146
 URL: http://www.Lynx-FS.com
 -

 This email is intended only for the use of the addressees named above and may be 
 confidential
or legally privileged. If you are not an addressee you must not read it and must not 
use any
information contained in it, nor copy it, nor inform any person other than Lynx 
Financial
Systems or the addressees of its existence or contents.  If you have received this 
email and are
not a named addressee, please delete it and notify the Lynx Financial Systems IT 
Department on
0113 2892990.




Re: [ql-users] efficient buffer size

2003-06-23 Thread paul holmgren

Marcel Kilgus wrote:

 Norman Dunbar wrote:
  In my day it was a ZX-81 with 1KB of memory - every byte counted then !!

 Despite my age this is where I started, too. But that's just history
 now.

 Marcel

My very first working true computer based setup I had at home was Strictly
experimental, a Motorola 6800 AND 256 BYTES, Yup, 256 bytes, the standard
setup came with 128 bytes cashe and 128 bytes RAM, I added the second
RAM chip to bring it up to 256 bytes, strickly a mechine language programming
effort.

-- 
Paul Holmgren
Hoosier Corps #33, L-6
2 57 300-C's in Indy


Re: [ql-users] efficient buffer size

2003-06-23 Thread Dilwyn Jones

  How about a puzzle thread on here?
  Q. What's the longest monosyllabic word? (Clue: ryrira yrggref)

 We don't all speak Welsh you know :o)
You'd have a hope. As W and Y are vowels in Welsh, you can't even use
them to make your monosyllables longer!

Now where did I put Goeff's Solvit Plus...

--
Dilwyn Jones



Re: [ql-users] efficient buffer size

2003-06-22 Thread Lau
ZN wrote:
On 6/21/2003 at 11:35 PM Lau wrote:


Back to my earlier mention of caching... hard drives and their 
controllers do caching as well. I'm not certain if they do read-ahead 
caching.


In short, yes.
Ta. I'll add a little proviso. The hardware can't know what the next 
logical sector(allocation unit) for a file is, hence the need for a file 
system to avoid fragmentation.

Unix (et alia) actually also does (did?) do software read-ahead. It 
assumed that when you read from a file, you would be wanting to read 
more of the same, so it initiated the read of the next sector (or 
whatever) as soon as it gave you the current one.

--
Lau
http://www.bergbland.info
Get a domain from http://oneandone.co.uk/xml/init?k_id=5165217 and I'll 
get the commission!



Re: [ql-users] efficient buffer size

2003-06-22 Thread Robert Newson
P Witte wrote:

...
That was the significance of

2^n  size   no.  s remarks
--- - - --- ---
 x:   xxx   xxx  xx Primer run
 ;)


A first time run to ensure that any caching that would be done by the first 
run was done before the first run so that all further runs had a similar 
same state machine?

When testing out the various versions of grep (grep, egrep, fgrep) on Unix, 
I ran the timings at least twice - first to ensure that the command and data 
were in memory  cached and ignored that time, and then a second (or more) 
time(s) for the actual timing.  Including the first run would make the 
comparison skewed as the data for second program would be cached after 
testing first program, plus I preferred to do several timings and take the 
mean (the first timing would be larger due to the program not being cached). 
 [If you're interested, I found egrep to be the fasted for the grepping I do.]




Re: [ql-users] efficient buffer size

2003-06-22 Thread Marcel Kilgus

P Witte wrote:
 Your explanation made reminded me that a considerable amount of buffering
 is already going on (the hard disk, Windoze, and Smsq). iof.load is possibly
 not much more efficient under those circumstances than iob.fmul.

Yes, it's not that much of a difference anymore. In the past iof.load
was much faster on machines with much RAM because it didn't invoke the
slaving mechanism. Now with slaving limited to only a little ram area
the data is effectively just copied around one more time in the case
of iob.fmul.

 But I thought that PIO mode hard disks, the current norm, actually pushed
 the data into memory with barely any intervention from the CPU.

PIO mode is the slow stuff where the CPU has to fetch the data. Ultra
DMA does its writes directly at the location the data is needed.

 Hehe, youre probably right, though I think I'll rely on my test results in
 this particular case. I suppose my real question was whether there is some
 sweet buffer size pertaining to Qdos/Smsq, that minimises fiddly
 edge conditions and the like.

It mostly depends on what you're doing. But I'd use a nice 16k or 32k
buffer for most purposes.

Marcel



Re: [ql-users] efficient buffer size

2003-06-22 Thread Tony Firshman
On  Mon, 23 Jun 2003 at 00:43:09, Lau wrote:
(ref: [EMAIL PROTECTED])

My suggestion of one byte buffers was a little facetious (one of the 
two words was the five vowels in order - there's one with them 
reversed).
'was' - 'with'
Ah that is worth remembering as it help spell the damn word (8-)
I have to be beardless as I can't find hash on this Malaysian machine.
I am in Kuala Lumpur setting up hardware for worldnews.com.
There is no web server on the Windows machine with my email program.  I 
am using VNC to get at the windows machine via the worldnews VPN using 
192.168 addressing.  Pretty good.  Last night I was also listening to 
the Archers while working on my home machine.  Almost as good as being 
home (8-)

Still food is cheaper here - 55p for a large lunch of chicken, broth, 
green veg etc ( what we might call soup here) second bowl of noodles and 
other things on the side.   They have a 1 ringgit note (15p)

subcontinental
uncomplimentary
duoliteral
abstemious
arterious
annelidous
Do I win a prize.  Lau - remember the BBC puzzle panel where I came up 
with more answers that they expected as well (8-)

 and you won the following week.
How about the geometric mean of our two responses - that would be 128 
bytes - but even better would be 124 bytes, which will have long 
alignment going for it and will certainly save some code (you can do it 
with MOVEQ).
Great to see you still thinking this way Lau. You are the master of code 
saving.  I will never forget your use of JMP instead of JSR in bp.init.

How many bytes was your Forth (and a good implementation so you said) - 
4k?

--
 QBBS (QL fido BBS 2:252/67) +44(0)1442-828255
 tony@surname.co.uk  http://www.firshman.co.uk
   Voice: +44(0)1442-828254   Fax: +44(0)1442-828255
TF Services, 29 Longfield Road, TRING, Herts, HP23 4DG


Re: [ql-users] efficient buffer size

2003-06-21 Thread Wolfgang Lenerz

On 21 Jun 2003, at 2:41, P Witte wrote:

(...)
 
 Yes, that is understood. It is in situations where the whole file cannot be
 read at once, Im thinking about. (Besides, on a multitasking machine it is
 probably not very polite to grab huge buffers ;)

(...)

Oh well, if you start worrying about being polite to other programs... 
:-)

I'd still simply grab just as much memory I can use.
If speed is of the essence, as you said in your requirements, then 
the user will probably also know to let the machine alone (tell him!) 
and not have too many other progs trying to get memory at the 
same time. If notn, then speed is not that essential, after all.

So I'd still go for as much memory as I can get and read in the 
entire file.

If that can't be done (not enough space):

Ultimately, it will then be the read operations that slow everything 
down.

Now, considering that iob.fmul  fstrg use D2 to indicate how many 
bytes they should get, and since D2 only can be word sized, you 
can, at most, read $f bytes in one go.

If nothing else, I'd use that as my buffer size
Wolfgang
-
www.wlenerz.com


Re: [ql-users] efficient buffer size

2003-06-21 Thread Lau
P Witte wrote:
snip
As far as I know, nothing my program does should be affected by the size
of the buffer, apart from filling it in the first place. So my findings
would seem to indicate that a buffer size of between 256 bytes! and 1k are
optimal for this kind of thing. This is strange enough, considering that
iob.fmul is called more frequently the smaller the buffer. What surprises
me is why were not seeing the benefits of iof.load in this (or at least I
dont). Anyone got a theory?
I was watching this thread with interest. Maybe I should have commented 
earlier... that the results you have obtained are exactly what I was 
expecting.

If you had run your test on a microdrive... you would have found that 
scatter load for reading in files in their entirety may have had some 
effect (on a 68008, where the cost of copying the data might have a 
noticeable effect).

I'm not sure that scatter load is implemented in any floppy driver, 
although it could be done. It could easily cut the time to load a file 
by a factor of three, on a standard sort of floppy. That's because of 
the interleave. (logical to physical sector mapping does every third 
sector... the idea being that you get two sectors worth of time between 
writing one sector to get you data sorted for writing the next sector. 
A consideration which is pretty much wasted when reading a file!)

You mention doing unspeakable things to the contents of the files. If 
that means doing an *enormous* amount of processing, then the buffer 
size and disk transfer times will always be irrelevant. Indeed, even if 
you only did a byte search through the data for a single fixed value 
byte, using Basic, that would mash the buffer copy by a factor of fifty, 
at least, at a guess.

Then there's caching... if you had enough memory, your test would 
produce some curious results... whichever buffer size you ran first 
would seem to be rather slow compared to the rest. As you ran your tests 
on your hard disk, the caching effect doesn't come into it, but I 
thought I'd mention it.

Finally, it all depends on how fast your processor and hard disk are. 
These days, processors tend to have a lot of cycles available per byte 
transferred from(to) disk. That's the reason why compressed hard disk 
partitions are actually *faster* to handle that uncompressed. The 
processor has lots of spare time on its hands to perform the 
(de)compression. Overall time to process a file becomes tied mainly to 
the raw disk access time to get all the data on/off the disk. A 
compressed file occupies about half the physical space, so can be 
handled in handled in half the overall time.

Back to my earlier mention of caching... hard drives and their 
controllers do caching as well. I'm not certain if they do read-ahead 
caching.

In summary, I believe your results are just giving the shear time it 
takes the drive to spin on its axis tracks * cylinders * interleave 
times, and have very little to do with any CPU processing.

Answer to your original question: How big a buffer should I use?... 
one byte?

--
Lau
http://www.bergbland.info
Get a domain from http://oneandone.co.uk/xml/init?k_id=5165217 and I'll 
get the commission!



Re: [ql-users] efficient buffer size

2003-06-21 Thread ZN

On 6/21/2003 at 11:35 PM Lau wrote:

Back to my earlier mention of caching... hard drives and their 
controllers do caching as well. I'm not certain if they do read-ahead 
caching.

In short, yes. Even older IDE drives with sufficient buffer memory at least
attempt to always read in the whole track if given time (no requests for
sectors from another track within the time needed for a full revolution).
However, the definition of a 'track' can vary - logical tracks (as
addressed by the CPU) have very little to do with the physycal actuality,
as most drives since the ra of 40Mb drives have constant linear bit density
- i.e. outer tracks, being of a larger circumference, have more sectors.
The drive does the translation into a uniform sector per track topology.
Most drives do read ahead in terms of physical tracks, but in some cases
(such as small buffers or odd translation schemes) will work in terms of
logical tracks. In any case, the actual mechanism is hardly important and
the implementation is is left to the drive manufacturer.



Re: [ql-users] efficient buffer size

2003-06-20 Thread P Witte

Wolfgang writes:

  A question: A program uses io.fstrg/iob.fmul to load files in
  smaller chunks for scanning. The files could be of any size on
  any media (first of all hard disks). What, theoretically, is the
  smallest efficient buffer size to use? (Im thinking *speed* here.)
  Eg 512 bytes, as a whole sector can be loaded in at once? Or
  allocation unit size? Or any arbitrary size that best suits my
  program?

 If you're thinking speed, then the larger the buffer, the
 better - reading the data in small chunks will always cost more
 time. If at all possible use a buffer for the entire file  (scatter)
 read it in.

Yes, that is understood. It is in situations where the whole file cannot be
read at once, Im thinking about. (Besides, on a multitasking machine it is
probably not very polite to grab huge buffers ;)

 Sector (512 bytes) sized buffers don't make that much sense
 IMHO, since the file data doesn't occupy the whole of the first
 sector (there's the file header), so reading the first 512 bytes from
 a file will read from 2 separate sectors.

This brings us to the heart of the question: What would be a sensible size?
First one block of 512-64b and then subsequent blocks of 512 bytes (or
multiples thereof)?

Per