I created one here: https://issues.apache.org/jira/browse/ARROW-2195
On Wed, Feb 21, 2018 at 8:11 AM, Wes McKinney <wesmck...@gmail.com> wrote: > Can we create a JIRA to track this issue? > > On Wed, Feb 21, 2018 at 5:04 AM, ALBERTO Bocchinfuso > <alberto_boc...@hotmail.it> wrote: > > Hi, > > > > Have you had any news on this issue? > > Do you plan to solve it for the next releases of Arrow, or is there any > way to avoid the problem? > > > > Thanks in advance, > > Alberto > > Da: Philipp Moritz<mailto:pcmor...@gmail.com> > > Inviato: venerdì 9 febbraio 2018 00:30 > > A: dev@arrow.apache.org<mailto:dev@arrow.apache.org> > > Oggetto: Re: [Python] Retrieving a RecordBatch from plasma inside a > function > > > > Thanks! I can indeed reproduce this problem. I'm a bit busy right now and > > plan to look into it on the weekend. > > > > Here is the preliminary backtrace for everybody interested: > > > > CESS (code=1, address=0x111138158) > > > > frame #0: 0x000000010e6457fc > > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + > 28 > > > > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py: > > > > -> 0x10e6457fc <+28>: movslq (%rdx,%rcx,4), %rdi > > > > 0x10e645800 <+32>: callq 0x10e698170 ; symbol stub > for: > > PyInt_FromLong > > > > 0x10e645805 <+37>: testq %rax, %rax > > > > 0x10e645808 <+40>: je 0x10e64580c ; <+44> > > > > (lldb) bt > > > > * thread #1: tid = 0xf1378e, 0x000000010e6457fc > > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + > 28, > > queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, > > address=0x111138158) > > > > * frame #0: 0x000000010e6457fc > > lib.so`__pyx_pw_7pyarrow_3lib_10Int32Value_1as_py(_object*, _object*) + > 28 > > > > frame #1: 0x000000010e5ccd35 lib.so`__Pyx_PyObject_ > CallNoArg(_object*) > > + 133 > > > > frame #2: 0x000000010e613b25 > > lib.so`__pyx_pw_7pyarrow_3lib_10ArrayValue_3__repr__(_object*) + 933 > > > > frame #3: 0x000000010c2f83bc libpython2.7.dylib`PyObject_Repr + 60 > > > > frame #4: 0x000000010c35f651 libpython2.7.dylib`PyEval_EvalFrameEx + > > 22305 > > > > On Tue, Feb 6, 2018 at 1:24 AM, ALBERTO Bocchinfuso < > > alberto_boc...@hotmail.it> wrote: > > > >> Hi, > >> > >> I’m using python 3.5.2 and pyarrow 0.8.0 > >> > >> As key, I put a string of 20 bytes, of course. I’m doing it differently > >> from the canonical way since I’m no more using python 2.7, but python 3, > >> and this seemed to me to be the right way to create a string of 20 > bytes. > >> The full code is: > >> > >> import pyarrow as pa > >> import pyarrow.plasma as plasma > >> > >> def retrieve1(): > >> client = plasma.connect('test', "", 0) > >> > >> key = "keynumber1keynumber1" > >> pid = plasma.ObjectID(bytearray(key,'UTF-8')) > >> > >> [buff] = client .get_buffers([pid]) > >> batch = pa.RecordBatchStreamReader(buff).read_next_batch() > >> > >> print(batch) > >> print(batch.schema) > >> print(batch[0]) > >> > >> return batch > >> > >> client = plasma.connect('test', "", 0) > >> > >> test1 = [1, 12, 23, 3, 21, 34] > >> test1 = pa.array(test1, pa.int32()) > >> > >> batch = pa.RecordBatch.from_arrays([test1], ['FIELD1']) > >> > >> key = "keynumber1keynumber1" > >> pid = plasma.ObjectID(bytearray(key,'UTF-8')) > >> sink = pa.MockOutputStream() > >> stream_writer = pa.RecordBatchStreamWriter(sink, batch.schema) > >> stream_writer.write_batch(batch) > >> stream_writer.close() > >> > >> bff = client.create(pid, sink.size()) > >> > >> stream = pa.FixedSizeBufferWriter(bff) > >> writer = pa.RecordBatchStreamWriter(stream, batch.schema) > >> writer.write_batch(batch) > >> client.seal(pid) > >> > >> batch = retrieve1() > >> print(batch) > >> print(batch.schema) > >> print(batch[0]) > >> > >> I hope this helps, > >> thank you > >> > >> Da: Philipp Moritz<mailto:pcmor...@gmail.com> > >> Inviato: martedì 6 febbraio 2018 00:00 > >> A: dev@arrow.apache.org<mailto:dev@arrow.apache.org> > >> Oggetto: Re: [Python] Retrieving a RecordBatch from plasma inside a > >> function > >> > >> Hey Alberto, > >> > >> Thanks for your message! I'm trying to reproduce it. > >> > >> Can you attach the code you use to write the batch into the store? > >> > >> Also can you say which version of Python and Arrow you are using? On my > >> installation, I get > >> > >> ``` > >> > >> In [*5*]: plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8")) > >> > >> ------------------------------------------------------------ > >> --------------- > >> > >> ValueError Traceback (most recent call > last) > >> > >> <ipython-input-5-fbec5bb33c33> in <module>() > >> > >> ----> 1 plasma.ObjectID(bytearray("keynumber1keynumber1", "UTF-8")) > >> > >> > >> plasma.pyx in pyarrow.plasma.ObjectID.__cinit__() > >> > >> > >> ValueError: Object ID must by 20 bytes, is keynumber1keynumber1 > >> ``` > >> > >> (the canonical way to do this would be plasma.ObjectID(b > >> "keynumber1keynumber1")) > >> > >> Best, > >> Philipp. > >> > >> On Mon, Feb 5, 2018 at 10:09 AM, ALBERTO Bocchinfuso < > >> alberto_boc...@hotmail.it> wrote: > >> > >> > Good morning, > >> > > >> > I am experiencing problems with the RecordBatches stored in plasma in > a > >> > particular situation. > >> > > >> > If I return a RecordBatch as result of a python function, I am able to > >> > read just the metadata, while I get an error when reading the columns. > >> > > >> > For example, the following code > >> > def retrieve1(): > >> > client = plasma.connect('test', "", 0) > >> > > >> > key = "keynumber1keynumber1" > >> > pid = plasma.ObjectID(bytearray(key,'UTF-8')) > >> > > >> > [buff] = client .get_buffers([pid]) > >> > batch = pa.RecordBatchStreamReader(buff).read_next_batch() > >> > return batch > >> > > >> > batch = retrieve1() > >> > print(batch) > >> > print(batch.schema) > >> > print(batch[0]) > >> > > >> > Represents a simple python code in which a function is in charge of > >> > retrieving the RecordBatch from the plasma store, and then returns it > to > >> > the caller. Running the previous example I get: > >> > <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48> > >> > FIELD1: int32 > >> > metadata > >> > -------- > >> > {} > >> > <pyarrow.lib.Int32Array object at 0x7fd0ebfc0f98> > >> > [ > >> > 1, > >> > 12, > >> > 23, > >> > 3, > >> > 21, > >> > 34 > >> > ] > >> > <pyarrow.lib.RecordBatch object at 0x7fd0ebfc0f48> > >> > FIELD1: int32 > >> > metadata > >> > -------- > >> > {} > >> > Errore di segmentazione (core dump creato) > >> > > >> > > >> > If I retrieve and use the data in the same part of the code (as I do > in > >> > the function retrieve1(), but it also works when I put everything in > the > >> > main program.) everything runs without problems. > >> > > >> > Also the problem seems to be related to the particular case in which I > >> > retrieve the RecordBatch from the plasma store, since the following > >> > (simpler) code: > >> > def create(): > >> > test1 = [1, 12, 23, 3, 21, 34] > >> > test1 = pa.array(test1, pa.int32()) > >> > > >> > batch = pa.RecordBatch.from_arrays([test1], ['FIELD1']) > >> > print(batch) > >> > print(batch.schema) > >> > print(batch[0]) > >> > return batch > >> > > >> > batch1 = create() > >> > print(batch1) > >> > print(batch1.schema) > >> > print(batch1[0]) > >> > > >> > Prints: > >> > > >> > <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598> > >> > FIELD1: int32 > >> > <pyarrow.lib.Int32Array object at 0x7f5f691fd9a8> > >> > [ > >> > 1, > >> > 12, > >> > 23, > >> > 3, > >> > 21, > >> > 34 > >> > ] > >> > <pyarrow.lib.RecordBatch object at 0x7f5f7b7a9598> > >> > FIELD1: int32 > >> > <pyarrow.lib.Int32Array object at 0x7f5f7e29f318> > >> > [ > >> > 1, > >> > 12, > >> > 23, > >> > 3, > >> > 21, > >> > 34 > >> > ] > >> > > >> > Which is what I expect. > >> > > >> > Is this issue known or am I doing something wrong when retrieving the > >> > RecordBatch from plasma? > >> > > >> > Also I would like to pinpoint the fact that this problem was as easy > to > >> > find as hard to re-create. For this reason, there can be other > situations > >> > in which the same problem arises that I did not experienced, since I > >> mostly > >> > deal with plasma and I’ve been using only python so long: the > >> description I > >> > gave is not intended to be complete. > >> > > >> > Thank you, > >> > Alberto > >> > > >> > >> > > >