Joshua Adelman, 06.02.2014 16:21:
> This discussion was initially started on the cython user google group, but I
> wanted to move the issue over to the dev list, as per the suggestion on
> cython_trac.
Original discussion is here:
http://thread.gmane.org/gmane.comp.python.cython.user/10646
> Given a numpy recarray containing two or more fixed-length string fields, if
> those string fields are adjacent to one another cython does not properly
> detect the boundary between the string fields. A concise test case
> demonstrating the problem is:
>
> ```cython
> cimport numpy as np
>
> cdef packed struct tstruct:
> np.float32_t a
> np.int16_t b
> char[6] c
> char[4] d
>
> def test_struct(tstruct[:] x):
> pass
> ```
>
> We then define some data on the python side:
>
> ```python
> import numpy as np
>
> a = np.recarray(3, dtype=[('a', np.float32), ('b', np.int16), ('c', '|S6'),
> ('d', '|S4')])
> a[0] = (1.1, 1, 'abcde', 'fgh')
> a[1] = (2.1, 2, 'ijklm', 'nop')
> a[2] = (3.1, 3, 'qrstu', 'vwx')
>
> test_struct(a)
> ```
>
> This results in the error:
>
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
>
> <ipython-input-12-ac01118a36a7> in <module>()
> ----> 1 test_struct(a)
>
> ValueError: Expected a dimension of size 6, got 10
>
>
> If we swap the order of the fields in the recarray and `tstruct` to (a,c,b,d)
> so that there is a numerical field between the string fields, then the
> function can parse the memory view correctly.
>
> The relevant line of code that catches the incorrect value of `enc_count` is:
> https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L468
>
> ```
> if (ctx->enc_count != ctx->head->field->type->arraysize[0]) {
> PyErr_Format(PyExc_ValueError,
> "Expected a dimension of size %zu, got %zu",
> ctx->head->field->type->arraysize[0],
> ctx->enc_count);
> return -1;
> }
> ```
>
> My naive guess is that there is something going on in:
> https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L738
i.e. this code:
"""
case 'O': case 's': case 'p':
if (ctx->enc_type == *ts && got_Z == ctx->is_complex &&
ctx->enc_packmode == ctx->new_packmode) {
/* Continue pooling same type */
ctx->enc_count += ctx->new_count;
} else {
/* New type */
if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL;
ctx->enc_count = ctx->new_count;
ctx->enc_packmode = ctx->new_packmode;
ctx->enc_type = *ts;
ctx->is_complex = got_Z;
}
++ts;
ctx->new_count = 1;
got_Z = 0;
break;
"""
> since that appears to be the only place where `enc_count` is being
> incremented. That would seem like the place where a boundary between two
> string fields might not be properly handled (the comment in the line above
> "Continue pooling same type" is suggestive.
Yes. That code is basically unchanged since Dag committed it back in 2009
(CC-ing him). However, this parser is fairly complex (not surprisingly, the
buffer type format *is* complicated), so I'm not sure about the right fix.
My guess is that the first part of the conditional belongs more in the
"case 'Z'" fall-through section that precedes the code above. Hard to say.
Even when I disable it ("if (0 &&...)"), I don't get any test failures in
the format tests.
In any case, I agree with you that the "pooling same size" is the bit that
goes wrong here.
I've attached a test case for now.
Stefan
diff -r 5e269313b1a1 tests/buffers/buffmt.pyx
--- a/tests/buffers/buffmt.pyx Sat Feb 08 12:11:59 2014 +0100
+++ b/tests/buffers/buffmt.pyx Sat Feb 08 19:45:36 2014 +0100
@@ -377,6 +377,22 @@
fmt, sizeof(PartiallyPackedStruct2))
+cdef packed struct PackedStructWithCharArrays:
+ float a
+ int b
+ char[5] c
+ char[3] d
+
+
+@testcase
+def packed_struct_with_strings(fmt):
+ """
+ >>> packed_struct_with_strings("T{f:a:i:b:5s:c:3s:d:}")
+ """
+ cdef object[PackedStructWithCharArrays] buf = MockBuffer(
+ fmt, sizeof(PackedStructWithCharArrays))
+
+
# TODO: empty struct
# TODO: Incomplete structs
# TODO: mixed structs
_______________________________________________
cython-devel mailing list
[email protected]
https://mail.python.org/mailman/listinfo/cython-devel