RE: [capnproto] Cap'n Proto for Elm

prasanth somasundar Thu, 30 May 2019 10:50:36 -0700

> One bit of food for thought: you can't exactly mmap() in elm, and even to get 
> from bytes to `Array Int` you have to do some non-trivial unmarshalling.  It 
> may make as much sense to just bite the bullet and parse the whole thing 
> (deeply) into an idiomatic data type up front, just like implementations 
> protobufs; by the time you take into account all of Elm's limitations, it's 
> not clear to me how much keeping it in the wire format like the C++ 
> implementation does actually buys you.

> Doing an up front parse solves a lot things, so if you go another route, be 
> clear on why.

This is intended to be a prototype. `Array Int` is an awful data type for this 
use case, but it works well enough for prototyping. Specifically, for the 
context of others reading this thread, `Array` in Elm is a tree structure and 
will not provide reasonable performance. That said, I’m hoping that I can use 
`elm-bytes<https://github.com/elm/bytes>` in a better way than being forced to 
decode it into a full Elm data type – though as I say this, it seems that I’d 
need some buy-in that I can’t be guaranteed. I may end up with that solution in 
the long run, but I want to implement one with the double array to start and 
see if I can convince a few people.

Still, it’s not clear to me why you’d use Cap’n Proto if you’re going to do a 
full serialization/deserialization. Just use Protobufs at that point. You could 
argue that this existing for completeness is valuable i.e. you can run capnp on 
your backend and not be forced to translate into a protobuf on your frontend, 
but at that point. I’m not sure that this is a good enough reason to write a 
library like this. Additionally, it’s not like JavaScript doesn’t have more 
complex capabilities like Uint8Array that Elm could take advantage of.

That said, I’m more or less treating this as immutable data and providing ways 
of reducing the cost of updates (such as batching updates). Haskell at least 
has the ST monad for performance. There just isn’t a better way of doing this 
in Elm as far as I know.

> It would make sense to publish this as a package by itself; it's a nice
> conceptual unit that would be useful as a library for other projects.

Sure, I was thinking the same thing. Just thought that I’d focus on the 
Capnproto implementation before publishing. It’s fairly separate though, so I’m 
not worried about separating it out once I’m ready.

> This was a somewhat awkward thing to cover with the Haskell
> implementation; what I ended up doing amounts to a glorified state
> monad:

So the `Struct` type is a glorified state monad. `fields` holds the record that 
acts as the struct’s definition. I’ve attached an example below that shows how 
I think this should work. Let me know if that makes sense and feels reasonably 
ergonomic.

Regarding namespacing in the parallel conversation: I think it’s kind of awful 
that Haskell records are accessed via functions instead of some scoped operator 
or the like. Not really useful as a comment, but I thought I’d add my 
displeasure.

Pointer field defaults: Field defaults in general are not features I feel super 
great about. Not that I’ve thought about this in horribly great depth, but they 
seem to be very problematic if they are ever updated – your binaries will read 
the same bytes as two different structs. I always assumed that’s why they were 
removed from proto3. They also don’t seem *that* useful as you can handle this 
on the application layer sufficiently well. I’m curious if others think 
differently and feel strongly about their inclusion.

getMainPhone : Struct AddressBook -> Struct PhoneNumber
getMainPhone s =
 let s : Struct AddressBook
 in s
    |> Capnp.get .people
    |> Capnp.List.get 0 AddressBook.person
    |> Capnp.get .mainPhone AddressBook.person_phoneNumber

-- assume d : Data exists. This is an `Array (Array Int)`
-- Inputs:
--  Struct
--    { data = d
--    , fields =
--      -- Field AddressBook (Capnp.List.List (StructField Person))
--      { people = ...
--      }
--    , viewOffset = (0, 0)
--    , currentTraversalDistance = 0
--    , traversalLimit = 67108864
--    }
-- Outputs:
--  Struct
--    { -- Data has not been updated. Hopefully, d is not actually copied,
--      -- and is simply a pointer, but I’m not sure how this works exactly.
--      -- If I have to, I can always separate d from the struct definition.
--      data = d
--    , fields =
--      -- Fields have been updated to a PhoneNumber
--      { number = ...
--      , type = ...
--      }
--    , -- View Offset represents the index into the data above.
--      -- Updated as necessary. We assume that the new offset is 40 here.
--      viewOffset = (0, 40)
--    , -- Data traversed so far. Assume that we've only traversed 30 bytes for
--         w/e reason.
--      currentTraversalDistance = 30
--    , traversalLimit = 67108864
--    }

From: David Renshaw <dwrens...@gmail.com>
Sent: Thursday, May 30, 2019 5:30 AM
To: Ian Denhardt <i...@zenhack.net>
Cc: prasanth somasundar <mezu...@live.com>; capnproto 
<capnproto@googlegroups.com>
Subject: Re: [capnproto] Cap'n Proto for Elm

Thanks! I wrote some comments inline below.

On Wed, May 29, 2019 at 11:38 PM Ian Denhardt 
<i...@zenhack.net<mailto:i...@zenhack.net>> wrote:
Quoting David Renshaw (2019-05-29 21:33:03)

>    This has piqued my interest. Which parts of the schema language don't
>    map well to Haskell/Elm?

The biggest one is nested namespaces, per discussion. Neither language
has intra-module namespaces, so you either end up doing a bunch of
complex logic to split stuff across multiple modules and still break
dependency cycles (in Haskell; per my earlier message, in Elm you're
just SOL, since mutually recursive modules are just not supported, full
stop), or you deal with long_names_with_underscores (Haskell actually
uses the single quote as a namespace separator).  This is a problem
for the Go implementation as well; some of the stuff from sandstorm's
web-session.capnp spits out identifiers that are pushing 100 characters.
(I actually bumped into @glycerine at a meetup just the other day; we
talked about this among other things).

That's unfortunate.

The fact that union field names are scoped to the struct is a bit
awkward, since union tag names are scoped at the module level in
most ML-family languages. More makeshift namespacing.

Sounds like this is awkward mainly because of the previous problem, i.e. 
Haskell lacks
nested namespaces. With nested namespaces, you would define your union datatype
within the namespace of the enclosing struct, and the tag names would have 
exactly
the right namespace.

The lack of a clean separation between unions and structs introduces a
bit of an impedance mismatch as well; if you do things naively you end
up with an awkward situation where *every* sum type is wrapped in a
struct, which is a bit odd since they are used so liberally (and are
normally so lightweight) in these languages. The Haskell implementation
specifically looks for structs which are one big anonymous union so it
can omit the wrapper.

If you have an anonymous union you also need to invent a name for the
field, since you can't actually have "anonymous" fields in records.

`which` is the usual name for such a field, as in: 
https://github.com/capnproto/capnproto/blob/0f368d5781872ffc3e63db54b0ac4a138b0e0a05/c%2B%2B/src/capnp/encoding-test.c%2B%2B#L121

For Haskell, there's no way to talk about a record type without giving
it a name, so every group needs an auxiliary type defined. There's not
really anything clearly nicer to do than just name it <Type>'<field> or
such, which makes the long name problem worse. Along similar lines, in
Haskell you end up having to define auxiliary types for parameter and
return types, and without more of a hint the end up being things like
<Type>'<method>'params and <Type>'<method>'results -- a mouthful even
for short type names. I've taken to just always manually giving my
parameter and return arguments names to avoid this kind of compiler
output; the schema is much more verbose, but the call site is much
nicer. None of this section applies to Elm since you can just have
anonymous record types.

Again, sounds like this is awkward mainly as a consequence of Haskell's lack of 
nested namespaces.

I intentionally decided to just not support custom default values for
pointer fields; it gets really awkward because messages can be mutable
or immutable, and you end up needing different implementation strategies
for each type; for immutable messages you can't do what most
implementations do (copy the value in place on first access),

Copying into an immutable message would mean mutating it,
so I agree that's not a good way to go.

but you
could "follow" the pointer into some constant defined in the generated
code without a copy. But that gets weird because there are functions to
access the underlying message/segment, so you could run into situations
where you've jumped to a whole other message silently.

Are there reasons that client code needs to use these functions? If not,
is there a way for you to hide them or mark them as internal-use-only?

With mutable
messages you can do the normal thing, but writing code that's generic
over both of these gets really weird. At some point I ended up checking
the schema that ship with capnproto, and with sandstorm, and discovered
that, in >9000 lines of schema source, the feature was used exactly
twice, both to set the default value of a text parameter to the empty
string. So I just said "screw it, this is a waste of time." The plugin
just prints a warning to stderr and ignores the custom default.

For what it's worth, I actually had someone request this feature last month: 
https://github.com/capnproto/capnproto-rust/issues/127
I'm not sure what their use case is, though.

I actually have a much longer critique that I think would be worth
writing, including some things that aren't a problem for Haskell
specifically, but cause problems for other languages -- and I am being
bothered to go help with dinner, so I'll leave it at this for now.

I'd be eager to read the longer critique!

-Ian

--
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
capnproto+unsubscr...@googlegroups.com<mailto:capnproto%2bunsubscr...@googlegroups.com>.
Visit this group at https://groups.google.com/group/capnproto.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/155918727572.10312.15632533580192568031%40localhost.localdomain.

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
Visit this group at https://groups.google.com/group/capnproto.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/BYAPR11MB25992AF5A436A2956118BE83C5180%40BYAPR11MB2599.namprd11.prod.outlook.com.

RE: [capnproto] Cap'n Proto for Elm

Reply via email to