On 1/3/18 9:42 AM, Steven Schveighoffer wrote:
On 1/3/18 2:47 AM, Christian Köstlin wrote:
On 02.01.18 21:13, Steven Schveighoffer wrote:
Well, you don't need to use appender for that (and doing so is copying a
lot of the data an extra time). All you need is to extend the pipe until
there isn't any more new data, and it will all be in the buffer.
// almost the same line from your current version
auto mypipe = openDev("../out/nist/2011.json.gz")
.bufd.unzip(CompressionFormat.gzip);
// This line here will work with the current release (0.0.2):
while(mypipe.extend(0) != 0) {}
Thanks for this input, I updated the program to make use of this method
and compare it to the appender thing as well.
Hm.. the numbers are worse! I would have expected to be at least
comparable. I'll have to look into it. Thanks for posting this.
Alright. I have spent some time examining the issues, and here are my
findings:
1. The major differentiator between the C and D algorithms is the use of
C realloc. This one thing saves the most time. I'm going to update
iopipe so you can use it (stand by). I will also be examining how to
simulate using realloc when not using C malloc in iopipe. I think it's
the copying of data to the new buffer that is causing issues.
2. extend(0) always attempts to read 8k more bytes. The buffer extends
by 8k by going into another couple pages. Somehow this is choking the
algorithm. I think the cost of extending pages actually ends up hurting
performance (something we need to look at in the GC in general).
3. extendElems should allow extending all the elements in the most
optimal fashion. Fixing that, iopipe performs as well as the
Appender/iopipe version. This is coming out as well.
Stay tuned, there will be updates to iopipe to hopefully make it as fast
in this microbenchmark as the C version :)
-Steve