Re: Want a function that determines a double or float given its 80-bit IEEE 754 SANE (big endian) representation

2023-08-23 Thread dan via Digitalmars-d-learn

On Wednesday, 23 August 2023 at 03:24:49 UTC, z wrote:

On Tuesday, 22 August 2023 at 22:38:23 UTC, dan wrote:

Hi,

I'm parsing some files, each containing (among other things) 
10 bytes said to represent an IEEE 754 extended floating point 
number, in SANE (Standard Apple Numerical Environment) form, 
as SANE existed in the early 1990s (so, big endian).


Note that the number actually stored will probably be a 
positive even integer less than 100,000, so a better format 
would have been to store a two-byte ushort rather than a 
10-byte float.  However the spec chose to have an encoded 
float there.


I would like to have a function of the form

public bool ubytes_to_double( ubytes[10] u, out double d ) 
{ /* stuff */ }


which would set d to the value encoded provided that the value 
is a number and is sane, and otherwise just return false.


So my plan is just to do this: examine the first 2 bytes to 
check the sign and see how big the number is, and if it is 
reasonable, convert the remaining 8 bytes to a fractional 
part, perhaps ignoring the last 2 or 3 as not being 
significant.


But --- it seems like this kind of task may be something that 
d already does, maybe with some constructor of a double or 
something.


Thanks in advance for any suggestions.

dan


On 32bit x86 an endianness swap and pointer cast to `real` 
should be enough.(seems to be the same format but i could be 
wrong.)
Else(afaik `real` on 64 bit x86 is just `double`?) you can 
always isolate sign mantissa and exponent to three isolated 
`double` values(cast from integer to `double`) and 
recalculate(`sign*mantissa*(2^^exponent)` according to 
wikipedia) the floating point number, since they mostly contain 
integers precision loss probably won't be a problem.


Thank you z.

My machine is 64-bit and is little-endian but the method you 
suggest actually gives the right answer in my case.


More exactly, a real on my machine is 16-bytes (128 bits), 
quadruple precision, and it has a sign bit with 15 bits of 
exponent.  But the 80-bit format also has a sign bit with 15 bits 
of exponent.


So all i have to do is declare a real y, cast  it to ubyte*, 
and copy the 10 ubytes from the file over its first 10 bytes (but 
backwards).  Then the sign bit and exponent exactly match in 
position.  (The remaining 6 ubytes are left in their initial 
state because they're way out to the least significant part of 
the number.)


Now, for my final code, i'm not actually doing this because the 
size of real may be different on another machine, or the 
exponents may get different sizes due to a different layout, or 
some other problem.


So i just do it by hand (although i'm ignoring the last 4 ubytes 
since for my usage, ultimately it gets boiled down to a 32-bit 
integer anyway).  And "by hand" is pretty close to what you also 
mention in your mantissa*2^^exponent expression.


Thanks again.


Re: parallel threads stalls until all thread batches are finished.

2023-08-23 Thread Sergey via Digitalmars-d-learn

On Wednesday, 23 August 2023 at 13:03:36 UTC, Joe wrote:

I use

foreach(s; taskPool.parallel(files, numParallel))
{ L(s); } // L(s) represents the work to be done.


If you make for example that L function return “ok” in case file 
successfully downloaded, you can try to use TaskPool.amap.


The other option - use std.concurrency probably.




parallel threads stalls until all thread batches are finished.

2023-08-23 Thread Joe--- via Digitalmars-d-learn

I use

foreach(s; taskPool.parallel(files, numParallel))
{ L(s); } // L(s) represents the work to be done.

to download files from the internet.

Everything works. The issue is this:

the foreach will download 8 files at once. BUT it will not start 
the next batch of 8 *until* ALL of the previous 8 are done. It 
seems that taskPool.parallel will not immediately start a new 
thread once a task is done


E.g., I get

L(s1);
L(s2);
...
L(s8);
--- // nothing below is executed until all L(s1) through L(s8) 
are finished.

L(s9);
L(s10);
...

My expectation is that, say, when the first task is complete, say 
L(s4), that L(s9) is then executed.


The reason why this causes me problems is that the downloaded 
files, which are cashed to a temporary file, stick around and do 
not free up space(think of it just as using memory) and this can 
cause some problems some of the time. Also, the point of parallel 
tasks is to allow paralleling but the way the code is working is 
that it starts the tasks in parallel but then essentially stalls 
the paralleling a large portion of the time. E.g.,


If there are a bunch of small downloads but one large one, then 
that one large download stalls the everything. E.g., say L(s5) is 
a very long download while all others are very quick. Then L(s5) 
will prevent downloading anything afterwards until it is 
finished(I'll get L(s1) through L(s8) but nothing else until 
L(s5) is finished).


What's going on and how to reconcile?








Re: File size

2023-08-23 Thread FeepingCreature via Digitalmars-d-learn

On Tuesday, 22 August 2023 at 16:22:52 UTC, harakim wrote:
On Monday, 21 August 2023 at 11:05:36 UTC, FeepingCreature 
wrote:
Can you print some of the wrong sizes? D's DirEntry iteration 
code just calls `FindFirstFileW`/`FindNextFileW`, so this 
*shouldn't* be a D-specific issue, and it should be possible 
to reproduce this in C.


Thanks for the suggestion. I was working on getting the list 
for you when I decided to first try and reproduce this on 
Linux. I was not able to do so. Then I opened the Linux File 
Explorer and went to one of the files. There were two files by 
that name, with names differing only by case.


In windows, I only saw one, because Windows Explorer only 
supports one file with an identical case-insensitive name per 
directory. Unsurprisingly, that is also the one that was 
selected by getSize(filename). The underlying windows functions 
must ignore case as well and select the same way as Explorer 
(which makes sense). That explains why Windows Explorer 
reported the same size as getsize(name) in every case, while 
DirEntry.size would match for the file with the same case as 
windows recognized and not for the file with a different case. 
I was able to get into this state because I copied the files 
(merged directories) in Linux.


It was interesting to look into. It seems everything is working 
as designed. It shouldn't be an issue for me going forward 
either as I move more and more towards Linux.


That's hilarious! I'm happy you found it.