Hello!
At the moment in format spec the DataType is enumeration:```
enum DataType {
BOOL = 0;
INT32 = 1;
INT64 = 2;
FLOAT = 3;
DOUBLE = 4;
STRING = 5;
LIST = 6;
DATE = 7;
TIMESTAMP = 8;
TIME = 9;
};
```
But it makes unclear what can be the subtype of the LIST. In the real
life, LIST is transformed to `list<>` in the output yaml:
```
- properties:
- name: feature
data_type: list<float>
is_primary: false
```
but it does not match with a format specification from my point of
view.
I would like to propose an update to the format definition by making
each possible DataType a message instead of enum. Something like:
```
message BOOL {
string name = 1;
};
message INT32 {
string name = 1;
};
message INT64 {
string name = 1;
};
...
message LIST {
string name = 1;
oneof element_type {
BOOL = 1;
INT32 = 2;
INT64 = 3;
...;
}
}
```
For the case we are not going to support nested collections.
For the real code it will look like:
```
- properties:
- name: feature
data_type:
name: list
element_type:
name: float
is_primary: false
```
Motivation of the proposed change: the current way left handling of
nested types to the specific implementation (like C++ impl writes it in
the way `list<float>`. We should enforce the way in the standard spec
instead!
If there won't be any negative feedback I will open a formal VOTE
process.
Best regards,
Sem
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]