jonkeane commented on a change in pull request #9560:
URL: https://github.com/apache/arrow/pull/9560#discussion_r582001099
##########
File path: python/pyarrow/types.pxi
##########
@@ -2386,25 +2426,102 @@ def struct(fields):
return pyarrow_wrap_data_type(struct_type)
+cdef _extract_union_params(children_fields, type_codes,
Review comment:
This is super nit-picky, but should the first arg be `child_fields`?
That would match what's in the body when iterating over them `child_fields`
yields a single `child_field` (as opposed to yielding a `children_field`).
##########
File path: cpp/src/arrow/type.h
##########
@@ -915,6 +926,16 @@ class ARROW_EXPORT Decimal128Type : public DecimalType {
};
/// \brief Concrete type class for 256-bit decimal data
+///
+/// Arrow decimals are fixed-point decimal numbers encoded as a scaled
+/// integer. The precision is the number of significant digits that the
+/// decimal type can represent; the scale is the number of digits after
+/// the decimal point (note the scale can be negative).
+///
+/// For most use cases, the maximum precision offered by Decimal128Type
+/// is sufficient, and it will result in a more compact and more efficient
+/// encoding. Decimal256Type is useful if you need a precision higher
+/// than 38 significant digits.
Review comment:
Would it be possible to mention the 38 sig fig limit in 128-bit decimal
documentation above (and if I'm calculating this correctly that the precision
limit for 256-bit decimals is 76)?
##########
File path: python/pyarrow/types.pxi
##########
@@ -2386,25 +2426,102 @@ def struct(fields):
return pyarrow_wrap_data_type(struct_type)
+cdef _extract_union_params(children_fields, type_codes,
+ vector[shared_ptr[CField]]* c_fields,
+ vector[int8_t]* c_type_codes):
+ cdef:
+ Field child_field
+
+ for child_field in children_fields:
+ c_fields[0].push_back(child_field.sp_field)
+
+ if type_codes is not None:
+ if len(type_codes) != <Py_ssize_t>(c_fields.size()):
+ raise ValueError("type_codes should have the same length "
+ "as fields")
+ for code in type_codes:
+ c_type_codes[0].push_back(code)
+ else:
+ c_type_codes[0] = range(c_fields.size())
+
+
+def sparse_union(children_fields, type_codes=None):
+ """
+ Create SparseUnionType from children fields.
+
+ A union is defined by an ordered sequence of child types; each slot in
+ the union can have a value chosen from these types.
+
+ Parameters
+ ----------
+ fields : sequence of Field values
Review comment:
This should be `children_fields : ...` to match the signature, yeah? (or
`child_fields` if we go that naming convention)
##########
File path: cpp/src/arrow/type.h
##########
@@ -999,6 +1031,20 @@ class ARROW_EXPORT SparseUnionType : public UnionType {
std::string name() const override { return "sparse_union"; }
};
+/// \brief Concrete type class for sparse union data
Review comment:
```suggestion
/// \brief Concrete type class for dense union data
```
?
##########
File path: python/pyarrow/types.pxi
##########
@@ -2110,11 +2110,28 @@ def float64():
cpdef DataType decimal128(int precision, int scale=0):
"""
- Create decimal type with precision and scale and 128bit width.
+ Create decimal type with precision and scale and 128-bit width.
+
+ Arrow decimals are fixed-point decimal numbers encoded as a scaled
+ integer. The precision is the number of significant digits that the
+ decimal type can represent; the scale is the number of digits after
+ the decimal point (note the scale can be negative).
+
+ As an example, ``decimal128(7, 3)`` can exactly represent the numbers
+ 1234.567 and -1234.567 (encoded internally as the 128-bit integers
+ 1234567 and -1234567, respectively), but neither 12345.67 nor 123.4567.
+
+ ``decimal128(5, -3)`` can exactly represent the number 12345000
+ (encoded internally as the 128-bit integer 12345), but neither
+ 123456000 nor 12345600.
Review comment:
It also would be unable to represent the numbers 123450000 and 1234500
correct? Would it be clearer to use those similar to the decimal example we use
above in type.h? Where the sig figs are exactly the same, but the decimal is
the only thing that is different?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]