Re: about binary protocol porting

2022-01-03 Thread Geoffrey Broadwell
Just from skimming some of the relevant docs (not having written a 
driver for Apache Ignite before), some thoughts:


 * It does indeed look like there is enough info, both as documentation
   and example code, to write codecs and drivers for Ignite
 * The formats and protocols look rather baroque, with significant
   historical baggage -- it's going to take quite a bit of work to get
   a fully compliant driver, though it does look like a smaller subset
   could be built to match just a particular need
 * There is a strong Java flavor to everything; there is some impedance
   mismatch with Raku (such as the Char array
   
<https://ignite.apache.org/docs/latest/binary-client-protocol/data-format#char-array>
   type, which is an array of UTF-16 code units that doesn't
   necessarily contain valid decodeable text)
 * There seems to be a contention in the design between desire to
   support a schema-less/plain-data mode and a schema/object mode; Raku
   easily has the metaobject protocol chops to make the latter possible
   without invoking truly deep magic, but it does require somewhat more
   advanced knowledge to write

So in short: It looks doable, but quite a fair chunk of work depending 
on how complete you need it to be, and some decisions need to be made 
about how pedantically to support their Java-flavored APIs.



On 1/3/22 7:39 PM, Piper H wrote:

Glad to hear these suggestions, @Geoffery.
I also have a question, this product has a clear binary protocol, do 
you know how to port it to perl or perl6?

https://ignite.apache.org/docs/latest/binary-client-protocol/binary-client-protocol
I was using their python/ruby clients, but there is not a perl version.

Thanks.
Piper

On Tue, Jan 4, 2022 at 11:15 AM Geoffrey Broadwell <mailto:g...@sonic.net>> wrote:


I love doing binary codecs for Raku[1]!  How you approach this
really depends on what formats and protocols you want to create
Raku modules for.

The first thing you need to be able to do is test if your codec is
correct.  It is notoriously easy to make a tiny mistake in a
protocol implementation and (especially for binary protocols) miss
it entirely because it only happens in certain edge cases.

If the format or protocol in question is open and has one or more
public test suites, you're in good shape.  Raku gives a lot of
power for refactoring tests to be very clean, and I've had good
success doing this with several formats.

If there is no public test suite, but you can find RFCs or other
detailed specs, you can often bootstrap a bespoke test suite from
the examples in the spec documents. Failing that, sometimes you
can find sites (even Wikipedia, for the most common formats) that
have known-correct examples to start with, or have published
reverse engineering of files or captured data.

If the format is truly proprietary, you'll be getting lots of
reverse engineering practice of your own. 

Now that you have some way of testing correctness, you'll want to
be able to diagnose the incorrect bits.  Make sure you have some
way of presenting easily-readable text expansions of the binary
format, because just comparing raw buffer contents can be rather
tedious (though I admit to having found bugs in a public test
suite by spending so much time staring at the buffers I could tell
they'd messed up a translation in a way that made the test always
pass).  If the format or protocol has an official text
translation/diagnostic/debug format -- CBOR, BSON, Protobuf, etc.
all have these -- so much the better, you should support that
format as soon as practical.

Once you get down to the nitty-gritty of writing the codec, I find
it is very important to make it work before making it fast.  There
is a lot of room for tuning Raku code, but it is WAY easier to get
things going in the right direction by starting off with idiomatic
Raku -- given/when, treating the data buffer as if it was a normal
Array (Positional really), and so on.

Make sure that with every protocol feature that you add, that you
make tests newly pass, and (I find at least) that you write the
coding and decoding bits at the same time, so you can check that
you can round-trip data successfully.  For the love of all that is
good, don't implement any obtuse features before the core features
are rock solid and pass the test suite with nary a hiccup.

After that, when you think you're ready to optimize, write
performance /tests/ first.  Make sure you test with data that will
both use your codec in a typical manner, and also test out all the
odd corners.  You're looking for things that seem weirdly slow;
this usually indicates a thinko like copying the entire buffer
each time you read a byte from it, or somesuch.

Once you've got the obvious performance kinks worked out, come by
and ask ag

Re: about binary protocol porting

2022-01-03 Thread Geoffrey Broadwell
I love doing binary codecs for Raku[1]!  How you approach this really 
depends on what formats and protocols you want to create Raku modules for.


The first thing you need to be able to do is test if your codec is 
correct.  It is notoriously easy to make a tiny mistake in a protocol 
implementation and (especially for binary protocols) miss it entirely 
because it only happens in certain edge cases.


If the format or protocol in question is open and has one or more public 
test suites, you're in good shape.  Raku gives a lot of power for 
refactoring tests to be very clean, and I've had good success doing this 
with several formats.


If there is no public test suite, but you can find RFCs or other 
detailed specs, you can often bootstrap a bespoke test suite from the 
examples in the spec documents.  Failing that, sometimes you can find 
sites (even Wikipedia, for the most common formats) that have 
known-correct examples to start with, or have published reverse 
engineering of files or captured data.


If the format is truly proprietary, you'll be getting lots of reverse 
engineering practice of your own. 


Now that you have some way of testing correctness, you'll want to be 
able to diagnose the incorrect bits.  Make sure you have some way of 
presenting easily-readable text expansions of the binary format, because 
just comparing raw buffer contents can be rather tedious (though I admit 
to having found bugs in a public test suite by spending so much time 
staring at the buffers I could tell they'd messed up a translation in a 
way that made the test always pass).  If the format or protocol has an 
official text translation/diagnostic/debug format -- CBOR, BSON, 
Protobuf, etc. all have these -- so much the better, you should support 
that format as soon as practical.


Once you get down to the nitty-gritty of writing the codec, I find it is 
very important to make it work before making it fast. There is a lot of 
room for tuning Raku code, but it is WAY easier to get things going in 
the right direction by starting off with idiomatic Raku -- given/when, 
treating the data buffer as if it was a normal Array (Positional 
really), and so on.


Make sure that with every protocol feature that you add, that you make 
tests newly pass, and (I find at least) that you write the coding and 
decoding bits at the same time, so you can check that you can round-trip 
data successfully.  For the love of all that is good, don't implement 
any obtuse features before the core features are rock solid and pass the 
test suite with nary a hiccup.


After that, when you think you're ready to optimize, write performance 
/tests/ first.  Make sure you test with data that will both use your 
codec in a typical manner, and also test out all the odd corners.  
You're looking for things that seem weirdly slow; this usually indicates 
a thinko like copying the entire buffer each time you read a byte from 
it, or somesuch.


Once you've got the obvious performance kinks worked out, come by and 
ask again, and we can give you further advice from there.  Or heck, just 
come visit us on IRC (#raku at Libera.chat), and we'll be happy to 
help.  (Do stick around for a while though, because traffic varies 
strongly by time of day and day of week.)


Best Regards,


Geoff (japhb)


[1]  I'm a bit of a nut for it, really.  In the distant past, I wrapped 
C libraries to get the job done, but more recently I've done them as 
plain Raku code (and sometimes NQP, the language that Rakudo is written in).


I've written some of the binary format codecs for Raku:

 * https://github.com/japhb/CBOR-Simple
   
 * https://github.com/japhb/BSON-Simple
   
 * https://github.com/japhb/Terminal-ANSIParser
   
 * https://github.com/japhb/TinyFloats
   

Modified or tuned others:

 * https://github.com/samuraisam/p6-pb/commits?author=japhb
   
 * https://github.com/japhb/serializer-perf
   
 * (Lots of stuff spread across various Cro
    repositories)

Added a spec extension for an existing standardized format (CBOR):

 * https://github.com/japhb/cbor-specs/blob/main/capture.md
   

And I think I forgot a few things.  




Re: trouble building rakudo

2009-11-22 Thread Geoffrey Broadwell
On Sun, 2009-11-22 at 08:15 -0800, Paul Simon wrote:
 The system has only 256 MB with few resources being used. That's a big 
 difference
 between 12+ hours and 3 minutes! Maybe I should invest in more RAM :-)

Yep, if you see swapping, more RAM is probably the single most effective
performance enhancer you can possibly throw at it.  The slowdown from
swapping to a spinning disk completely swamps all performance
differences between CPUs, for instance.

FWIW, the next thing to improve depends on your common tasks: a solid
state disk and a better video card are usually the best bets.  The SSD
is great for compiles, program startups, and other disk-intensive
processes.  (Get a good one, the cheap ones are awful.)  The video card
is most useful if you do 3D, watch videos, or have a desktop that uses a
compositing engine.  If you just use twm to run xterms, it's not all
that valuable.  :-)

Still, both of those are WAY further down the list than getting enough
RAM to do the compile in main memory.


-'f




Re: RPN calculator in Perl 6

2009-06-08 Thread Geoffrey Broadwell
On Sat, 2009-06-06 at 13:18 -0300, Daniel Ruoso wrote: 
 http://sial.org/pbot/37077
 A slightly improved syntax, as per jnthn++ suggestion...

My list mail has been very delayed, so this may be out of sequence, but
in case no one mentioned it yet:

http://sial.org/pbot/37102

(That's ruoso++'s later 37100 paste with a couple small tweaks by me.)

I wanted to shrink that even further by replacing the given/when with a
direct call of the correct op variant, but I couldn't figure out how to
do that in current Rakudo.  (And just using eval inside a multi seemed
to be broken.)


-'f




Re: Converting a Perl 5 pseudo-continuation to Perl 6

2009-01-03 Thread Geoffrey Broadwell
On Fri, 2009-01-02 at 22:56 +0100, Aristotle Pagaltzis wrote:
  When I asked this question on #perl6, pmurias suggested using
  gather/take syntax, but that didn't feel right to me either --
  it's contrived in a similar way to using a one-off closure.
 
 Contrived how?

Meaning, the gather/take syntax doesn't make much sense, because we're
not gathering anything; the PID file handler has nothing to return.
We'd only be using it for the side effect of being able to pause the
callee's execution and resume it later.

 When you have an explicit entity representing the continuation,
 all of these questions resolve themselves in at once: all calls
 to the original routine create a new continuation, and all calls
 via the state object are resumptions. There is no ambiguity or
 subtlety to think about.

I like this argument.  I'm not sure it's applicable in every case, but
it certainly applies to the class of situations containing my problem.

 So from the perspective of the caller, I consider the “one-off”
 closure ideal: the first call yields an object that can be used
 to resume the call.
 
 However, I agree that having to use an extra block inside the
 routine and return it explicity is suboptimal. It would be nice
 if there was a `yield` keyword that not only threw a resumable
 exception, but also closed over the exception object in a
 function that, when called, resumes the original function.
 
 That way, you get this combination:
 
 sub pid_file_handler ( $filename ) {
 # ... top half ...
 yield;
 # ... bottom half ...
 }
 
 sub init_server {
 # ...
 my $write_pid = pid_file_handler( $optionspid_file );
 become_daemon();
 $write_pid();
 # ...
 }

That's pretty nice.  Perhaps we can make it even cleaner with a few
small tweaks to init_server():

sub init_server(:$pid_file, ...) {
# ...
my write_pid := pid_file_handler($pid_file);
become_daemon();
write_pid();
# ...
}

So far, this variant is winning for me, I think.  It's slightly more
verbose on the caller's side than the yield variant I had proposed, but
it's also more explicit, and allows (as you said) a clean syntactic
separation between starting the PID file handler and continuing it.

It does bring up a question, though.  What if pid_file_handler() needed
to be broken into three or more pieces, thus containing multiple yield
statements?  Does only the first one return a continuation object, which
can be called repeatedly to continue after each yield like this?

sub init_server(:$pid_file, ...) {
# ...
my more_pid_stuff := pid_file_handler($pid_file);
become_daemon();

more_pid_stuff();
do_something();

more_pid_stuff();
do_something_else();

more_pid_stuff();
# ...
}

Or does each yield produce a fresh new continuation object like this?

sub init_server(:$pid_file, ...) {
# ...
my write_pid   := pid_file_handler($pid_file);
become_daemon();

my fold_pid:= write_pid();
do_something();

my spindle_pid := fold_pid();
do_something_else();

spindle_pid();
# ...
}

(Note that I assume you can simply ignore the returned object if you
don't plan to continue the operation any more, without raising a
warning.)

Certainly the first version has less visual clutter, so I tend to lean
that way by default.  But the second design would allow one to create a
tree of partial executions, by calling any earlier continuation object
again.  That's a very powerful concept that I don't want to give up on.

Supporting both feels like it might be an adverb on the invocation
(possibly with a frosty sugar coating available).  It would be nice to
support invoking a continuation in ratcheting and forgetful modes.

Thoughts?


-'f




Re: Converting a Perl 5 pseudo-continuation to Perl 6

2009-01-01 Thread Geoffrey Broadwell
On Fri, 2009-01-02 at 00:30 +0200, Leon Timmermans wrote:
 I can't help wondering why does pid_file_handler need to be split up
 in the first place? Why wouldn't it be possible to simply call
 pid_file_handler after become_daemon?

Two answers:

1. If an error occurs that will not allow the PID file to be created
   (another copy of the daemon is already running, the user doesn't
   have required root permissions, or what have you), the program
   should die visibly at the command line, rather than *appearing* to
   launch but actually just spitting an error into the syslog and
   disappearing silently.  Checking for another running daemon and
   taking ownership of the pid file should be an atomic operation
   (or at the very least err on the side of failing noisily if
   something fishy happens), so I can't just check for an existing
   pid file before daemonizing, and then create the new pid file after.

   It's not visible in the code I posted, but the program should also
   do a number of other sanity checks before it daemonizes, for the
   very same reasons.  For example, it should load all modules it
   expects to use before becoming a daemon, and complain loudly if
   it can't.

2. The particular code I used is just a decent example to ask about
   the general question of a better syntax for interrupting and
   continuing a sub.  So even if I could do what you say, I'd still
   have the question.  :-)


-'f