Re: try find the fastest way to convert a group string into index?

2018-09-16 Thread learnfirst1 via Digitalmars-d-learn

On Sunday, 16 September 2018 at 11:51:24 UTC, learnfirst1 wrote:

On Sunday, 16 September 2018 at 11:30:11 UTC, learnfirst1 wrote:
On Sunday, 16 September 2018 at 10:14:24 UTC, Vladimir 
Panteleev wrote:



I will test pcre solution vs mpfc for benchmark.  the pcre is 
easy to deal with low/up case.


The PCRE:

/^(?:Accept(*:0)|Accept\-Charset(*:1)|Accept\-Encoding(*:2)|Accept\-Language(*:3)|Access\-Control\-Allow\-Credentials(*:4)|Access\-Control\-Allow\-Origin(*:5)|Access\-Control\-Allow\-Methods(*:6)|Access\-Control\-Allow\-Headers(*:7)|Access\-Control\-Max\-Age(*:8)|Access\-Control\-Expose\-Headers(*:9)|Access\-Control\-Request\-Method(*:10)|Access\-Control\-Request\-Headers(*:11)|Age(*:12)|Allow(*:13)|Authorization(*:14)|Cache\-Control(*:15)|Connection(*:16)|Content\-Encoding(*:17)|Content\-Language(*:18)|Content\-Length(*:19)|Content\-Location(*:20)|Content\-Range(*:21)|Content\-Security\-Policy(*:22)|Content\-Type(*:23)|Cookie(*:24)|Date(*:25)|ETag(*:26)|Expect(*:27)|Expires(*:28)|Host(*:29)|If\-Match(*:30)|If\-Modified\-Since(*:31)|If\-None\-Match(*:32)|If\-Range(*:33)|If\-Unmodified\-Since(*:34)|Last\-Modified(*:35)|Location(*:36)|Origin(*:37)|Pragma(*:38)|Proxy\-Authenticate(*:39)|Proxy\-Authorization(*:40)|Range(*:41)|Referer(*:42)|Retry\-After(*:43)|Sec\-Websocket\-Exte!
nsions(*:44)|Sec\-Websocket\-Key(*:45)|Sec\-Websocket\-Origin(*:46)|Sec\-Websocket\-Protocol(*:47)|Sec\-Websocket\-Version(*:48)|Server(*:49)|Set\-Cookie(*:50)|Strict\-Transport\-Security(*:51)|Transfer\-Encoding(*:52)|Upgrade(*:53)|User\-Agent(*:54)|Vary(*:55)|Via(*:56)|WWW\-Authenticate(*:57))$/i


Re: try find the fastest way to convert a group string into index?

2018-09-16 Thread learnfirst1 via Digitalmars-d-learn

On Sunday, 16 September 2018 at 11:30:11 UTC, learnfirst1 wrote:
On Sunday, 16 September 2018 at 10:14:24 UTC, Vladimir 
Panteleev wrote:


It has to be case Case Insensitive,  so before I run mpfh for 
each new request headers,  I need to convert the keys into low or 
up case.  this value can be storage on a stack local tmp cache.   
 and I has to run it on the get(string key) method again.





Re: try find the fastest way to convert a group string into index?

2018-09-16 Thread learnfirst1 via Digitalmars-d-learn
On Sunday, 16 September 2018 at 10:14:24 UTC, Vladimir Panteleev 
wrote:

On Sunday, 16 September 2018 at 10:04:09 UTC, learnfirst1 wrote:
how to make this more fast like with one loop and get the 
results.



thanks for reply, minimal perfect hashing seems good for this 
task, I will do more test for confirm.


I try to storage the value into a static array, so when  diff 
middleware try access the key it will reused the array offset 
without need to run hash on key string. (all middleware can 
access the key by static enum name).


with Perfect Hash Function, I can translate the hash number into 
static enum and use them.


The code also need provide string get(string key) method, in this 
case the header key may not on the static preset arrays,  in this 
case it should search a smaller array (save on the parse 
process).   with this solution I has to run hash for each of 
request headers , with less than 127 items (maybe around 16 at 
max) I am not sure it is fast then direct string compare. for 
example:


foreach(int header_index, ref header; new_request.headers) {
   int hash = mpfh(header.key);
   auto enum_index = static_eum[ hash ];
   new_request.header[enum_index] = header_index  ;
}




the simple static switch is not the best way as I know here.   
with compiled PCRE match it will "Context-Type", 
"Context-Encoding" in once ( and maybe there is some SSE4 to pass 
16 byte for each loop).



Diederik de Groot


please ignore the Header.data assign part. which just for the 
demo. in the real app it will only parse once and storage in pre 
-alloc request/response object.





Re: try find the fastest way to convert a group string into index?

2018-09-16 Thread Diederik de Groot via Digitalmars-d-learn

Homework ?

I think it would also be better to only walk/match the 
Header.data and header_keys only once (using the generated switch 
statement that Vladimir suggested). So your Headers struct should 
have a parse() function, which will match all of the headers to 
the keys and store them locally in a 'matches[Type] = string' 
array (aka map). Then the get() function can simply  return 
matches[type] without any work.


Other improvements/Bonus Points:
- Because the parse() function is only called/initialize once, 
you could make the matches array immutable/const if.
- Your Type enum and header_keys contain the same data, and if 
you are going to use Vlafimir's suggestion of using a mixin, you 
can stringify the enum names an use them instead. This way you 
don't have to keep anything  in sync.
- If these Headers are coming from a client over the internet, it 
might be wise to set and check some (arbitrary) bounds for 
header-length and content. You are currently blindly copying it 
and expecting it is 'clean/correct'.
- Why copy the data to Header.data, instead of just providing it 
by ref to the get()/parse() function. You are copying it to then 
later indexing into Header.data using the map(). Why not 
copy/move piecemeal and assign them to an indexed array after 
inspecting/matching them. That  could make for cleaner code and 
nicer API. (With two caveats: we could loose the original headers 
if we don't save them somewhere else and unmatched/unknown Header 
Types will not be copied).




Re: try find the fastest way to convert a group string into index?

2018-09-16 Thread Vladimir Panteleev via Digitalmars-d-learn

On Sunday, 16 September 2018 at 10:04:09 UTC, learnfirst1 wrote:
how to make this more fast like with one loop and get the 
results.


This is a more general problem than any specific programming 
language; you may want to look into perfect hashing:


https://en.wikipedia.org/wiki/Perfect_hash_function

There exists software to generate perfect hash functions, e.g. 
GNU gperf:


https://www.gnu.org/software/gperf/

It looks like Basile wrote one for D:

https://github.com/BBasile/IsItThere

If you don't want to go that far, you can generate a string 
switch using string mixins:


void main()
{
static immutable string[] keywords = ["foo", "bar", "baz"];
enum Keyword { foo, bar, baz }

Keyword parseKeyword(string s)
{
switch (s)
{
mixin({
string code;
foreach (keyword; keywords)
	code ~= `case "` ~ keyword ~ `": return Keyword.` ~ keyword 
~ `;`;

return code;
}());
default:
throw new Exception("No such keyword!");
}
}
assert(parseKeyword("bar") == Keyword.bar);
}



try find the fastest way to convert a group string into index?

2018-09-16 Thread learnfirst1 via Digitalmars-d-learn

The use case is translate http header key into enum.

this is the current code : https://run.dlang.io/is/lpw29w

In this fake code I only list a few headers,  it should be more.  
but less then 128 and only include the a few codes.


how to make this more fast like with one loop and get the results.

I am think of use PCRE compile for this, but not sure how.

It should also save the non default headers in to a new array, so 
when search non default headers will reduce the total size.


I am plan to write a more fast http server with D betterC compare 
to vibed.