[ 
https://issues.apache.org/jira/browse/ARROW-13427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385693#comment-17385693
 ] 

Jim Gan edited comment on ARROW-13427 at 7/22/21, 6:08 PM:
-----------------------------------------------------------

I switched to use PyArrrow 4.0.1 now and will see if such an error occurs again.

 

The crash in to_pydict() occurred after encountering a corrupted arrow file 
(read size was different from actual size)

 

It would be great that such crash could be prevented because it killed the long 
running pyspark job(pipeline).  In my case it is OK to skip or ignore the 
corrupted arrow files.

 

I added pa_table.validate() in front of to_pydict() which seems to avoid the 
crash. My pipeline finished successfully with the change.

 
{code:java}
try:
   for batch_id, batch in enumerate(arrow_stream):
        pa_table = pa.Table.from_batches([batch]).select(self.schema)
        # add validate() that seems to avoid the crash
        pa_table.validate()
        pv_dict = pa_table.to_pydict()
        
except Exception as e:
   logging.error("file: {} got exception: {}".format(file, str(e)))
{code}
 

The error messages in the log:

(1) got exception: Expected to be able to read 11431400 bytes for message body, 
got 10389614

(2) /arrow/cpp/src/arrow/array/data.cc:94: Check failed: (off) <= (length) 
Slice offset greater than array length


was (Author: jgan2012):
I switched to use PyArrrow 4.0.1 now and will see if such an error occurs again.

 

The crash in to_pydict() occurred after encountering a corrupted arrow file 
(read size was different from actual size)

 

It would be great that such crash could be prevented because it killed the long 
running pyspark job(pipeline).  In my case it is OK to skip or ignore the 
corrupted arrow files.

 

I added pa_table.validate() in front of to_pydict() which seems to avoid the 
crash. My pipeline finished successfully with the change.

 
{code:java}
try:
   for batch_id, batch in enumerate(arrow_stream):
        pa_table = pa.Table.from_batches([batch]).select(self.schema)
        # add validate() that seems to avoid the crash
        pa_table.validate()
        pv_dict = pa_table.to_pydict()
        
except Exception as e:
   logging.error("file: {} got exception: {}".format(file, str(e)))
{code}
(1) got exception: Expected to be able to read 11431400 bytes for message body, 
got 10389614

(2) /arrow/cpp/src/arrow/array/data.cc:94: Check failed: (off) <= (length) 
Slice offset greater than array length

> pa_table.to_pydict() crashed , Check failed: (off) <= (length) Slice offset 
> greater than array length
> -----------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-13427
>                 URL: https://issues.apache.org/jira/browse/ARROW-13427
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C, C++
>         Environment: python arrow run in spark env SparkContext: Running 
> Spark version 3.0.4.4
> pyarrow version 2.0.0
>            Reporter: Jim Gan
>            Priority: Major
>
> I'm not sure if this issue is related to 
> https://issues.apache.org/jira/browse/ARROW-10054
> {code}
> [2021-07-22 02:43:17,457 INFO get_bucket_trig_feat_raw2.py:100] read batch 0
> /arrow/cpp/src/arrow/array/data.cc:94: Check failed: (off) <= (length) Slice 
> offset greater than array length
> /usr/local/lib64/python3.6/site-packages/pyarrow/libarrow.so.200(+0x554d58)[0x7f491040bd58]
> /usr/local/lib64/python3.6/site-packages/pyarrow/libarrow.so.200(_ZN5arrow4util8ArrowLogD1Ev+0xdd)[0x7f491040c5ad]
> /usr/local/lib64/python3.6/site-packages/pyarrow/libarrow.so.200(_ZNK5arrow9ArrayData5SliceEll+0x3c5)[0x7f491054c1f5]
> /usr/local/lib64/python3.6/site-packages/pyarrow/libarrow.so.200(_ZNK5arrow5Array5SliceEll+0x18)[0x7f491055e708]
> /usr/local/lib64/python3.6/site-packages/pyarrow/libarrow.so.200(_ZN5arrow8internal23ScalarFromArraySlotImpl5VisitINS_8ListTypeEEENS_6StatusERKNS_13BaseListArrayIT_EE+0x45)[0x7f4910582ab5]
> /usr/local/lib64/python3.6/site-packages/pyarrow/libarrow.so.200(_ZN5arrow16VisitArrayInlineINS_8internal23ScalarFromArraySlotImplEEENS_6StatusERKNS_5ArrayEPT_+0xce)[0x7f49105904fe]
> /usr/local/lib64/python3.6/site-packages/pyarrow/libarrow.so.200(_ZNO5arrow8internal23ScalarFromArraySlotImpl6FinishEv+0x11b)[0x7f49105917fb]
> /usr/local/lib64/python3.6/site-packages/pyarrow/libarrow.so.200(_ZNK5arrow5Array9GetScalarEl+0x35)[0x7f4910569ce5]
> /usr/local/lib64/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so(+0x189e28)[0x7f4914042e28]
> /usr/local/lib64/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so(+0xbab9c)[0x7f4913f73b9c]
> /usr/local/lib64/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so(+0xb5368)[0x7f4913f6e368]
> /usr/local/lib64/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so(+0xc709f)[0x7f4913f8009f]
> /usr/local/lib64/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so(+0x11785e)[0x7f4913fd085e]
> /usr/local/lib64/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so(+0x12256b)[0x7f4913fdb56b]
> /usr/local/lib64/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so(+0x13398c)[0x7f4913fec98c]
> /usr/local/lib64/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so(+0x12256b)[0x7f4913fdb56b]
> /usr/local/lib64/python3.6/site-packages/pyarrow/lib.cpython-36m-x86_64-linux-gnu.so(+0x146823)[0x7f4913fff823]
> /lib64/libpython3.6m.so.1.0(_PyCFunction_FastCallDict+0x31a)[0x7f492888f6ba]
> /lib64/libpython3.6m.so.1.0(+0x167a50)[0x7f492888fa50]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(+0x16818f)[0x7f492889018f]
> /lib64/libpython3.6m.so.1.0(+0x139f22)[0x7f4928861f22]
> /lib64/libpython3.6m.so.1.0(+0x13741a)[0x7f492885f41a]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x7b6)[0x7f49288b8806]
> /lib64/libpython3.6m.so.1.0(+0x142a3a)[0x7f492886aa3a]
> /lib64/libpython3.6m.so.1.0(+0x167b36)[0x7f492888fb36]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(+0x114497)[0x7f492883c497]
> /lib64/libpython3.6m.so.1.0(+0x142bf0)[0x7f492886abf0]
> /lib64/libpython3.6m.so.1.0(+0x167b36)[0x7f492888fb36]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(+0x114497)[0x7f492883c497]
> /lib64/libpython3.6m.so.1.0(+0x142bf0)[0x7f492886abf0]
> /lib64/libpython3.6m.so.1.0(+0x167b36)[0x7f492888fb36]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(+0x114497)[0x7f492883c497]
> /lib64/libpython3.6m.so.1.0(+0x142bf0)[0x7f492886abf0]
> /lib64/libpython3.6m.so.1.0(+0x167b36)[0x7f492888fb36]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(+0x114497)[0x7f492883c497]
> /lib64/libpython3.6m.so.1.0(+0x142bf0)[0x7f492886abf0]
> /lib64/libpython3.6m.so.1.0(+0x167b36)[0x7f492888fb36]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(+0x114497)[0x7f492883c497]
> /lib64/libpython3.6m.so.1.0(+0x142bf0)[0x7f492886abf0]
> /lib64/libpython3.6m.so.1.0(+0x167b36)[0x7f492888fb36]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(+0x114497)[0x7f492883c497]
> /lib64/libpython3.6m.so.1.0(+0x142bf0)[0x7f492886abf0]
> /lib64/libpython3.6m.so.1.0(+0x167b36)[0x7f492888fb36]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(+0x142a3a)[0x7f492886aa3a]
> /lib64/libpython3.6m.so.1.0(+0x167b36)[0x7f492888fb36]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(+0x114497)[0x7f492883c497]
> /lib64/libpython3.6m.so.1.0(+0x142bf0)[0x7f492886abf0]
> /lib64/libpython3.6m.so.1.0(+0x167b36)[0x7f492888fb36]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(PyEval_EvalCodeEx+0x337)[0x7f492889a397]
> /lib64/libpython3.6m.so.1.0(PyEval_EvalCode+0x1b)[0x7f492889b0eb]
> /lib64/libpython3.6m.so.1.0(+0x212d00)[0x7f492893ad00]
> /lib64/libpython3.6m.so.1.0(_PyCFunction_FastCallDict+0x92)[0x7f492888f432]
> /lib64/libpython3.6m.so.1.0(+0x167a50)[0x7f492888fa50]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(+0x114497)[0x7f492883c497]
> /lib64/libpython3.6m.so.1.0(+0x142bf0)[0x7f492886abf0]
> /lib64/libpython3.6m.so.1.0(+0x167b36)[0x7f492888fb36]
> /lib64/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x474)[0x7f49288b84c4]
> /lib64/libpython3.6m.so.1.0(PyEval_EvalCodeEx+0x337)[0x7f492889a397]
> /lib64/libpython3.6m.so.1.0(+0x173153)[0x7f492889b153]
> /lib64/libpython3.6m.so.1.0(PyObject_Call+0x47)[0x7f492883dfb7]
> /lib64/libpython3.6m.so.1.0(+0x213f31)[0x7f492893bf31]
> /lib64/libpython3.6m.so.1.0(Py_Main+0x2f0)[0x7f492893c360]
> /opt/python/bin/python3.6(main+0x116)[0x55c723ec4b96]
> /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f49279df6a3]
> /opt/python/bin/python3.6(_start+0x2e)[0x55c723ec4d1e]
> 21/07/22 02:43:17,631 ERROR Executor: Exception in task 298.0 in stage 5.0 
> (TID 303)
> org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
>  at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:536)
>  at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:525)
>  at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
>  at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:643)
>  at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:621)
>  at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:456)
>  at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>  at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1209)
>  at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1215)
>  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
>  at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:177)
>  at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>  at org.apache.spark.scheduler.Task.run(Task.scala:127)
>  at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:472)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:475)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
>  at java.io.DataInputStream.readInt(DataInputStream.java:392)
>  at 
> org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:628)
>  ... 17 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to