Re: Feedback on Binary Data updates

David Herman Wed, 20 Jul 2011 10:52:19 -0700

Hi Luke,

The idea is definitely to subsume typed arrays as completely as possible.


> * Array types of fixed length
> The current design fixes the length of an ArrayType instance as part of the 
> ArrayType definition, instead of as a parameter to the resulting constructor. 
>  I'm not sure I understand the motivation for that.

The idea is that all Types have a known size, and all Data instances are 
allocated contiguously.

For example, if you could put unsized array types inside of struct types, it 
wouldn't be clear how to allocate an instance of the struct:

    var MyStruct = new StructType({
        a: Uint8Array,
        b: Uint8Array
    });
    var s = new MyStruct; // ???

But you're right that this is inconsistent with typed arrays. Maybe this can be 
remedied by allowing both sized and unsized array types, and simply requiring 
nested types to be sized.

> * Compatibility with Typed Arrays array objects
> There are a few divergences between Binary Data arrays and Typed Array 
> arrays, that look like they could be addressed:
> - The constructor difference mentioned above, including support for copy 
> constructors.

I don't know what you mean by copy constructors. Are you talking about being 
able to construct a type by wrapping it around an existing ArrayBuffer? That 
doesn't copy, but I do think we should support it, as I said in my preso at the 
f2f in San Bruno. That's something I intended to add to the wiki page but 
hadn't gotten to yet.

> - Lack of buffer, byteLength, byteOffset, BYTES_PER_ELEMENT.   I see these 
> are noted in TODO.

Yep.

I do think there's a case to be made for not exposing the ArrayBuffer for Data 
objects that were not explicitly constructed on top of an ArrayBuffer. This 
would hide architecture-specific data that is currently leaked by the Typed 
Arrays API. It also accommodates the two classes of usage scenario involving 
binary data:

Scenario 1: I/O

    socket.readBuffer(1000, function(buf) {
        var s = new MyStruct(buf, 0); // also allow an optional endianness 
argument
        ... do some computation on s ...
    });

Scenario 2: Pure computation

    var s = new MyStruct({ x: 0, y: 0 });
    ... do some computation on s ...

Scenario 1 comes up when reading files, network sockets, etc; here you *have* 
to let the programmer control the endianness and layout/padding. The simplest 
way to do the latter is simply to assume zero padding, as with Data Views, and 
then the programmer would have to insert padding bytes where necessary.

Scenario 2 comes up when building internal data structures. Here the system 
should use whatever padding and endianness is going to be the most efficient 
for the architecture, but that detail should ideally not be exposed to the 
programmer. So in that case, we could make the .buffer field censored, by 
having it be null or an accessor that throws.

> - array.set(otherArr, offset) support on the Binary Data arrays

Good catch; looks unproblematic.

> - Conversions, see below
> - Different prototype chains, additional members like elementType on binary 
> data arrays.  
> 
> The last item is one of the reasons why it would be nice to pull the Typed 
> Arrays objects into Binary Data, so that they could be augment to be fully 
> consistent - for example, to expose the elementType.

If we can pull them into the prototype hierarchy, that's cool, but we still 
have to see. In particular, if we want to close off some of the leaks I 
describe above, then we may have to retain some distinction.

> * Conversions
> The rules for conversions of argument values into the primitive value types 
> seem to be different than typical ES conversions and those used by 
> TypedArrays via WebIDL.  Why not use ToInt32 and friends for conversion?  
> Current rules appear to be quite strict - throwing on most type mismatches, 
> and also more permissive for some unexpected cases like "0x"-prefixed strings.

Interesting question. I may have followed js-ctypes too blindly on this.

> * DataView integration with structs
> DataView is an important piece of Typed Arrays for reading from heterogenous 
> binary data sources like files and network protocols, and for controlling 
> endianness of data reads.  DataView would seem to benefit from structs, and 
> structs would benefit from DataView.  This is another reason to want to spec 
> DataView itself in ES.next.  I imagine an additional pair of functions on 
> DataView akin to the following would allow nice interop between DataView and 
> Binary Data "Types"/"Data":
> 
>    Data getData(Type type, unsigned long byteOffset, optional boolean 
> littleEndian);
>    void setData(Type type, unsigned long byteOffset, Data value, optional 
> boolean littleEndian);

I agree that this kind of use case is important, and I'm not opposed to 
DataViews, but I'm not sure the ArrayBuffer approach described above doesn't 
already handle this, e.g.:

    new T(ArrayBuffer buffer, unsigned long byteOffset, optional boolean 
littleEndian);

> * Explicit inclusion of Uint32Array  and similar objects
> The Uint32Array and similar objects defined in Type Arrays are the ones that 
> are likely to be the most commonly used in many/most use cases, but these are 
> missing from the ES.next proposal.  Including them in the ES.next proposal 
> explicitly, as supersets of the Typed Arrays objects, would avoid users 
> having to manually create them, and help ensure full API consistency.

I'm open to this. I think there's no technical concern, just a question of 
what's the best "home" for Typed Arrays.

> * A lot of meta- objects
> The spec defines 14 objects, without yet defining any of the 10 typed arrays 
> objects.  Several of the objects only serve as scaffolding for the 
> meta-hierarchy, and don't appear to be objects which users are expected to 
> frequently (or ever) work with.  Are the named "Type" and "Data" objects 
> needed in the proposal?

This doesn't really bother me. As you say, users don't really need to work with 
them; they're mostly there to set up the inheritance of shared methods, and 
they make for a nicely symmetric meta-class hierarchy. From the user's 
standpoint, they'll mostly just care about the primitive types, StructType and 
ArrayType, and then the type and data objects they create.

> * Naming
> The term "Type" feels somewhat too generic for referring to struct shapes.  
> The previous "block" terminology actually sounded more natural, or at least 
> more scoped.

The reason I eliminated "block" was that it's such a highly-used term for many 
different things (e.g. block statements, block functions). The terms Type and 
Data are implicitly scoped to the @binary module, which is one of the benefits 
of modules: you don't have to explicitly scope every single definition's name 
to the subject matter at hand.

Dave

_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Feedback on Binary Data updates

Reply via email to