Re: PySpark with OpenCV causes python worker to crash

2015-06-05 Thread Davies Liu
Please file a bug here: https://issues.apache.org/jira/browse/SPARK/

Could you also provide a way to reproduce this bug (including some datasets)?

On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga sammiest...@gmail.com wrote:
 I've changed the SIFT feature extraction to SURF feature extraction and it
 works...

 Following line was changed:
 sift = cv2.xfeatures2d.SIFT_create()

 to

 sift = cv2.xfeatures2d.SURF_create()

 Where should I file this as a bug? When not running on Spark it works fine
 so I'm saying it's a spark bug.

 On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga sammiest...@gmail.com wrote:

 Yea should have emphasized that. I'm running the same code on the same VM.
 It's a VM with spark in standalone mode and I run the unit test directly on
 that same VM. So OpenCV is working correctly on that same machine but when
 moving the exact same OpenCV code to spark it just crashes.

 On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu dav...@databricks.com wrote:

 Could you run the single thread version in worker machine to make sure
 that OpenCV is installed and configured correctly?

 On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga sammiest...@gmail.com
 wrote:
  I've verified the issue lies within Spark running OpenCV code and not
  within
  the sequence file BytesWritable formatting.
 
  This is the code which can reproduce that spark is causing the failure
  by
  not using the sequencefile as input at all but running the same
  function
  with same input on spark but fails:
 
  def extract_sift_features_opencv(imgfile_imgbytes):
  imgfilename, discardsequencefile = imgfile_imgbytes
  imgbytes = bytearray(open(/tmp/img.jpg, rb).read())
  nparr = np.fromstring(buffer(imgbytes), np.uint8)
  img = cv2.imdecode(nparr, 1)
  gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
  sift = cv2.xfeatures2d.SIFT_create()
  kp, descriptors = sift.detectAndCompute(gray, None)
  return (imgfilename, test)
 
  And corresponding tests.py:
  https://gist.github.com/samos123/d383c26f6d47d34d32d6
 
 
  On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga sammiest...@gmail.com
  wrote:
 
  Thanks for the advice! The following line causes spark to crash:
 
  kp, descriptors = sift.detectAndCompute(gray, None)
 
  But I do need this line to be executed and the code does not crash
  when
  running outside of Spark but passing the same parameters. You're
  saying
  maybe the bytes from the sequencefile got somehow transformed and
  don't
  represent an image anymore causing OpenCV to crash the whole python
  executor.
 
  On Fri, May 29, 2015 at 2:06 AM, Davies Liu dav...@databricks.com
  wrote:
 
  Could you try to comment out some lines in
  `extract_sift_features_opencv` to find which line cause the crash?
 
  If the bytes came from sequenceFile() is broken, it's easy to crash a
  C library in Python (OpenCV).
 
  On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga
  sammiest...@gmail.com
  wrote:
   Hi sparkers,
  
   I am working on a PySpark application which uses the OpenCV
   library. It
   runs
   fine when running the code locally but when I try to run it on
   Spark on
   the
   same Machine it crashes the worker.
  
   The code can be found here:
   https://gist.github.com/samos123/885f9fe87c8fa5abf78f
  
   This is the error message taken from STDERR of the worker log:
   https://gist.github.com/samos123/3300191684aee7fc8013
  
   Would like pointers or tips on how to debug further? Would be nice
   to
   know
   the reason why the worker crashed.
  
   Thanks,
   Sam Stoelinga
  
  
   org.apache.spark.SparkException: Python worker exited unexpectedly
   (crashed)
   at
  
   org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
   at
  
  
   org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:176)
   at
   org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at
   org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at
  
   org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
   at
  
  
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
  
  
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
   Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:392)
   at
  
   org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
  
  
  
 
 
 




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: PySpark with OpenCV causes python worker to crash

2015-06-05 Thread Sam Stoelinga
I've changed the SIFT feature extraction to SURF feature extraction and it
works...

Following line was changed:
sift = cv2.xfeatures2d.SIFT_create()

to

sift = cv2.xfeatures2d.SURF_create()

Where should I file this as a bug? When not running on Spark it works fine
so I'm saying it's a spark bug.

On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga sammiest...@gmail.com wrote:

 Yea should have emphasized that. I'm running the same code on the same VM.
 It's a VM with spark in standalone mode and I run the unit test directly on
 that same VM. So OpenCV is working correctly on that same machine but when
 moving the exact same OpenCV code to spark it just crashes.

 On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu dav...@databricks.com wrote:

 Could you run the single thread version in worker machine to make sure
 that OpenCV is installed and configured correctly?

 On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga sammiest...@gmail.com
 wrote:
  I've verified the issue lies within Spark running OpenCV code and not
 within
  the sequence file BytesWritable formatting.
 
  This is the code which can reproduce that spark is causing the failure
 by
  not using the sequencefile as input at all but running the same function
  with same input on spark but fails:
 
  def extract_sift_features_opencv(imgfile_imgbytes):
  imgfilename, discardsequencefile = imgfile_imgbytes
  imgbytes = bytearray(open(/tmp/img.jpg, rb).read())
  nparr = np.fromstring(buffer(imgbytes), np.uint8)
  img = cv2.imdecode(nparr, 1)
  gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
  sift = cv2.xfeatures2d.SIFT_create()
  kp, descriptors = sift.detectAndCompute(gray, None)
  return (imgfilename, test)
 
  And corresponding tests.py:
  https://gist.github.com/samos123/d383c26f6d47d34d32d6
 
 
  On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga sammiest...@gmail.com
  wrote:
 
  Thanks for the advice! The following line causes spark to crash:
 
  kp, descriptors = sift.detectAndCompute(gray, None)
 
  But I do need this line to be executed and the code does not crash when
  running outside of Spark but passing the same parameters. You're saying
  maybe the bytes from the sequencefile got somehow transformed and don't
  represent an image anymore causing OpenCV to crash the whole python
  executor.
 
  On Fri, May 29, 2015 at 2:06 AM, Davies Liu dav...@databricks.com
 wrote:
 
  Could you try to comment out some lines in
  `extract_sift_features_opencv` to find which line cause the crash?
 
  If the bytes came from sequenceFile() is broken, it's easy to crash a
  C library in Python (OpenCV).
 
  On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga sammiest...@gmail.com
 
  wrote:
   Hi sparkers,
  
   I am working on a PySpark application which uses the OpenCV
 library. It
   runs
   fine when running the code locally but when I try to run it on
 Spark on
   the
   same Machine it crashes the worker.
  
   The code can be found here:
   https://gist.github.com/samos123/885f9fe87c8fa5abf78f
  
   This is the error message taken from STDERR of the worker log:
   https://gist.github.com/samos123/3300191684aee7fc8013
  
   Would like pointers or tips on how to debug further? Would be nice
 to
   know
   the reason why the worker crashed.
  
   Thanks,
   Sam Stoelinga
  
  
   org.apache.spark.SparkException: Python worker exited unexpectedly
   (crashed)
   at
  
 org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
   at
  
  
 org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:176)
   at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at
  
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
   at
  
  
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
  
  
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
   Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:392)
   at
  
 org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
  
  
  
 
 
 





Re: PySpark with OpenCV causes python worker to crash

2015-06-05 Thread Sam Stoelinga
Yea should have emphasized that. I'm running the same code on the same VM.
It's a VM with spark in standalone mode and I run the unit test directly on
that same VM. So OpenCV is working correctly on that same machine but when
moving the exact same OpenCV code to spark it just crashes.

On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu dav...@databricks.com wrote:

 Could you run the single thread version in worker machine to make sure
 that OpenCV is installed and configured correctly?

 On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga sammiest...@gmail.com
 wrote:
  I've verified the issue lies within Spark running OpenCV code and not
 within
  the sequence file BytesWritable formatting.
 
  This is the code which can reproduce that spark is causing the failure by
  not using the sequencefile as input at all but running the same function
  with same input on spark but fails:
 
  def extract_sift_features_opencv(imgfile_imgbytes):
  imgfilename, discardsequencefile = imgfile_imgbytes
  imgbytes = bytearray(open(/tmp/img.jpg, rb).read())
  nparr = np.fromstring(buffer(imgbytes), np.uint8)
  img = cv2.imdecode(nparr, 1)
  gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
  sift = cv2.xfeatures2d.SIFT_create()
  kp, descriptors = sift.detectAndCompute(gray, None)
  return (imgfilename, test)
 
  And corresponding tests.py:
  https://gist.github.com/samos123/d383c26f6d47d34d32d6
 
 
  On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga sammiest...@gmail.com
  wrote:
 
  Thanks for the advice! The following line causes spark to crash:
 
  kp, descriptors = sift.detectAndCompute(gray, None)
 
  But I do need this line to be executed and the code does not crash when
  running outside of Spark but passing the same parameters. You're saying
  maybe the bytes from the sequencefile got somehow transformed and don't
  represent an image anymore causing OpenCV to crash the whole python
  executor.
 
  On Fri, May 29, 2015 at 2:06 AM, Davies Liu dav...@databricks.com
 wrote:
 
  Could you try to comment out some lines in
  `extract_sift_features_opencv` to find which line cause the crash?
 
  If the bytes came from sequenceFile() is broken, it's easy to crash a
  C library in Python (OpenCV).
 
  On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga sammiest...@gmail.com
  wrote:
   Hi sparkers,
  
   I am working on a PySpark application which uses the OpenCV library.
 It
   runs
   fine when running the code locally but when I try to run it on Spark
 on
   the
   same Machine it crashes the worker.
  
   The code can be found here:
   https://gist.github.com/samos123/885f9fe87c8fa5abf78f
  
   This is the error message taken from STDERR of the worker log:
   https://gist.github.com/samos123/3300191684aee7fc8013
  
   Would like pointers or tips on how to debug further? Would be nice to
   know
   the reason why the worker crashed.
  
   Thanks,
   Sam Stoelinga
  
  
   org.apache.spark.SparkException: Python worker exited unexpectedly
   (crashed)
   at
  
 org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
   at
  
  
 org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:176)
   at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
   at org.apache.spark.scheduler.Task.run(Task.scala:64)
   at
   org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
   at
  
  
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
  
  
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
   Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:392)
   at
  
 org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
  
  
  
 
 
 



Re: PySpark with OpenCV causes python worker to crash

2015-06-05 Thread Sam Stoelinga
Thanks Davies. I will file a bug later with code and single image as
dataset. Next to that I can give anybody access to my vagrant VM that
already has spark with OpenCV and the dataset available.

Or you can setup the same vagrant machine at your place. All is automated ^^
git clone https://github.com/samos123/computer-vision-cloud-platform
cd computer-vision-cloud-platform
./scripts/setup.sh
vagrant ssh

(Expect failures, I haven't cleaned up and tested it for other people) btw
I study at Tsinghua also currently.

On Fri, Jun 5, 2015 at 2:43 PM, Davies Liu dav...@databricks.com wrote:

 Please file a bug here: https://issues.apache.org/jira/browse/SPARK/

 Could you also provide a way to reproduce this bug (including some
 datasets)?

 On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga sammiest...@gmail.com
 wrote:
  I've changed the SIFT feature extraction to SURF feature extraction and
 it
  works...
 
  Following line was changed:
  sift = cv2.xfeatures2d.SIFT_create()
 
  to
 
  sift = cv2.xfeatures2d.SURF_create()
 
  Where should I file this as a bug? When not running on Spark it works
 fine
  so I'm saying it's a spark bug.
 
  On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga sammiest...@gmail.com
 wrote:
 
  Yea should have emphasized that. I'm running the same code on the same
 VM.
  It's a VM with spark in standalone mode and I run the unit test
 directly on
  that same VM. So OpenCV is working correctly on that same machine but
 when
  moving the exact same OpenCV code to spark it just crashes.
 
  On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu dav...@databricks.com
 wrote:
 
  Could you run the single thread version in worker machine to make sure
  that OpenCV is installed and configured correctly?
 
  On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga sammiest...@gmail.com
  wrote:
   I've verified the issue lies within Spark running OpenCV code and not
   within
   the sequence file BytesWritable formatting.
  
   This is the code which can reproduce that spark is causing the
 failure
   by
   not using the sequencefile as input at all but running the same
   function
   with same input on spark but fails:
  
   def extract_sift_features_opencv(imgfile_imgbytes):
   imgfilename, discardsequencefile = imgfile_imgbytes
   imgbytes = bytearray(open(/tmp/img.jpg, rb).read())
   nparr = np.fromstring(buffer(imgbytes), np.uint8)
   img = cv2.imdecode(nparr, 1)
   gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
   sift = cv2.xfeatures2d.SIFT_create()
   kp, descriptors = sift.detectAndCompute(gray, None)
   return (imgfilename, test)
  
   And corresponding tests.py:
   https://gist.github.com/samos123/d383c26f6d47d34d32d6
  
  
   On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga 
 sammiest...@gmail.com
   wrote:
  
   Thanks for the advice! The following line causes spark to crash:
  
   kp, descriptors = sift.detectAndCompute(gray, None)
  
   But I do need this line to be executed and the code does not crash
   when
   running outside of Spark but passing the same parameters. You're
   saying
   maybe the bytes from the sequencefile got somehow transformed and
   don't
   represent an image anymore causing OpenCV to crash the whole python
   executor.
  
   On Fri, May 29, 2015 at 2:06 AM, Davies Liu dav...@databricks.com
   wrote:
  
   Could you try to comment out some lines in
   `extract_sift_features_opencv` to find which line cause the crash?
  
   If the bytes came from sequenceFile() is broken, it's easy to
 crash a
   C library in Python (OpenCV).
  
   On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga
   sammiest...@gmail.com
   wrote:
Hi sparkers,
   
I am working on a PySpark application which uses the OpenCV
library. It
runs
fine when running the code locally but when I try to run it on
Spark on
the
same Machine it crashes the worker.
   
The code can be found here:
https://gist.github.com/samos123/885f9fe87c8fa5abf78f
   
This is the error message taken from STDERR of the worker log:
https://gist.github.com/samos123/3300191684aee7fc8013
   
Would like pointers or tips on how to debug further? Would be
 nice
to
know
the reason why the worker crashed.
   
Thanks,
Sam Stoelinga
   
   
org.apache.spark.SparkException: Python worker exited
 unexpectedly
(crashed)
at
   
   
 org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
at
   
   
   
 org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:176)
at
org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at
   
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at
   
   
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
   
   
   
 

Re: PySpark with OpenCV causes python worker to crash

2015-06-05 Thread Sam Stoelinga
Please ignore this whole thread. It's working out of nowhere. I'm not sure
what was the root cause. After I restarted the VM the previous SIFT code
also started working.

On Fri, Jun 5, 2015 at 10:40 PM, Sam Stoelinga sammiest...@gmail.com
wrote:

 Thanks Davies. I will file a bug later with code and single image as
 dataset. Next to that I can give anybody access to my vagrant VM that
 already has spark with OpenCV and the dataset available.

 Or you can setup the same vagrant machine at your place. All is automated
 ^^
 git clone https://github.com/samos123/computer-vision-cloud-platform
 cd computer-vision-cloud-platform
 ./scripts/setup.sh
 vagrant ssh

 (Expect failures, I haven't cleaned up and tested it for other people) btw
 I study at Tsinghua also currently.

 On Fri, Jun 5, 2015 at 2:43 PM, Davies Liu dav...@databricks.com wrote:

 Please file a bug here: https://issues.apache.org/jira/browse/SPARK/

 Could you also provide a way to reproduce this bug (including some
 datasets)?

 On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga sammiest...@gmail.com
 wrote:
  I've changed the SIFT feature extraction to SURF feature extraction and
 it
  works...
 
  Following line was changed:
  sift = cv2.xfeatures2d.SIFT_create()
 
  to
 
  sift = cv2.xfeatures2d.SURF_create()
 
  Where should I file this as a bug? When not running on Spark it works
 fine
  so I'm saying it's a spark bug.
 
  On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga sammiest...@gmail.com
 wrote:
 
  Yea should have emphasized that. I'm running the same code on the same
 VM.
  It's a VM with spark in standalone mode and I run the unit test
 directly on
  that same VM. So OpenCV is working correctly on that same machine but
 when
  moving the exact same OpenCV code to spark it just crashes.
 
  On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu dav...@databricks.com
 wrote:
 
  Could you run the single thread version in worker machine to make sure
  that OpenCV is installed and configured correctly?
 
  On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga sammiest...@gmail.com
 
  wrote:
   I've verified the issue lies within Spark running OpenCV code and
 not
   within
   the sequence file BytesWritable formatting.
  
   This is the code which can reproduce that spark is causing the
 failure
   by
   not using the sequencefile as input at all but running the same
   function
   with same input on spark but fails:
  
   def extract_sift_features_opencv(imgfile_imgbytes):
   imgfilename, discardsequencefile = imgfile_imgbytes
   imgbytes = bytearray(open(/tmp/img.jpg, rb).read())
   nparr = np.fromstring(buffer(imgbytes), np.uint8)
   img = cv2.imdecode(nparr, 1)
   gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
   sift = cv2.xfeatures2d.SIFT_create()
   kp, descriptors = sift.detectAndCompute(gray, None)
   return (imgfilename, test)
  
   And corresponding tests.py:
   https://gist.github.com/samos123/d383c26f6d47d34d32d6
  
  
   On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga 
 sammiest...@gmail.com
   wrote:
  
   Thanks for the advice! The following line causes spark to crash:
  
   kp, descriptors = sift.detectAndCompute(gray, None)
  
   But I do need this line to be executed and the code does not crash
   when
   running outside of Spark but passing the same parameters. You're
   saying
   maybe the bytes from the sequencefile got somehow transformed and
   don't
   represent an image anymore causing OpenCV to crash the whole python
   executor.
  
   On Fri, May 29, 2015 at 2:06 AM, Davies Liu dav...@databricks.com
 
   wrote:
  
   Could you try to comment out some lines in
   `extract_sift_features_opencv` to find which line cause the crash?
  
   If the bytes came from sequenceFile() is broken, it's easy to
 crash a
   C library in Python (OpenCV).
  
   On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga
   sammiest...@gmail.com
   wrote:
Hi sparkers,
   
I am working on a PySpark application which uses the OpenCV
library. It
runs
fine when running the code locally but when I try to run it on
Spark on
the
same Machine it crashes the worker.
   
The code can be found here:
https://gist.github.com/samos123/885f9fe87c8fa5abf78f
   
This is the error message taken from STDERR of the worker log:
https://gist.github.com/samos123/3300191684aee7fc8013
   
Would like pointers or tips on how to debug further? Would be
 nice
to
know
the reason why the worker crashed.
   
Thanks,
Sam Stoelinga
   
   
org.apache.spark.SparkException: Python worker exited
 unexpectedly
(crashed)
at
   
   
 org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
at
   
   
   
 org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:176)
at
   
 org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at 

Re: PySpark with OpenCV causes python worker to crash

2015-06-05 Thread Davies Liu
Thanks for let us now.

On Fri, Jun 5, 2015 at 8:34 AM, Sam Stoelinga sammiest...@gmail.com wrote:
 Please ignore this whole thread. It's working out of nowhere. I'm not sure
 what was the root cause. After I restarted the VM the previous SIFT code
 also started working.

 On Fri, Jun 5, 2015 at 10:40 PM, Sam Stoelinga sammiest...@gmail.com
 wrote:

 Thanks Davies. I will file a bug later with code and single image as
 dataset. Next to that I can give anybody access to my vagrant VM that
 already has spark with OpenCV and the dataset available.

 Or you can setup the same vagrant machine at your place. All is automated
 ^^
 git clone https://github.com/samos123/computer-vision-cloud-platform
 cd computer-vision-cloud-platform
 ./scripts/setup.sh
 vagrant ssh

 (Expect failures, I haven't cleaned up and tested it for other people) btw
 I study at Tsinghua also currently.

 On Fri, Jun 5, 2015 at 2:43 PM, Davies Liu dav...@databricks.com wrote:

 Please file a bug here: https://issues.apache.org/jira/browse/SPARK/

 Could you also provide a way to reproduce this bug (including some
 datasets)?

 On Thu, Jun 4, 2015 at 11:30 PM, Sam Stoelinga sammiest...@gmail.com
 wrote:
  I've changed the SIFT feature extraction to SURF feature extraction and
  it
  works...
 
  Following line was changed:
  sift = cv2.xfeatures2d.SIFT_create()
 
  to
 
  sift = cv2.xfeatures2d.SURF_create()
 
  Where should I file this as a bug? When not running on Spark it works
  fine
  so I'm saying it's a spark bug.
 
  On Fri, Jun 5, 2015 at 2:17 PM, Sam Stoelinga sammiest...@gmail.com
  wrote:
 
  Yea should have emphasized that. I'm running the same code on the same
  VM.
  It's a VM with spark in standalone mode and I run the unit test
  directly on
  that same VM. So OpenCV is working correctly on that same machine but
  when
  moving the exact same OpenCV code to spark it just crashes.
 
  On Tue, Jun 2, 2015 at 5:06 AM, Davies Liu dav...@databricks.com
  wrote:
 
  Could you run the single thread version in worker machine to make
  sure
  that OpenCV is installed and configured correctly?
 
  On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga
  sammiest...@gmail.com
  wrote:
   I've verified the issue lies within Spark running OpenCV code and
   not
   within
   the sequence file BytesWritable formatting.
  
   This is the code which can reproduce that spark is causing the
   failure
   by
   not using the sequencefile as input at all but running the same
   function
   with same input on spark but fails:
  
   def extract_sift_features_opencv(imgfile_imgbytes):
   imgfilename, discardsequencefile = imgfile_imgbytes
   imgbytes = bytearray(open(/tmp/img.jpg, rb).read())
   nparr = np.fromstring(buffer(imgbytes), np.uint8)
   img = cv2.imdecode(nparr, 1)
   gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
   sift = cv2.xfeatures2d.SIFT_create()
   kp, descriptors = sift.detectAndCompute(gray, None)
   return (imgfilename, test)
  
   And corresponding tests.py:
   https://gist.github.com/samos123/d383c26f6d47d34d32d6
  
  
   On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga
   sammiest...@gmail.com
   wrote:
  
   Thanks for the advice! The following line causes spark to crash:
  
   kp, descriptors = sift.detectAndCompute(gray, None)
  
   But I do need this line to be executed and the code does not crash
   when
   running outside of Spark but passing the same parameters. You're
   saying
   maybe the bytes from the sequencefile got somehow transformed and
   don't
   represent an image anymore causing OpenCV to crash the whole
   python
   executor.
  
   On Fri, May 29, 2015 at 2:06 AM, Davies Liu
   dav...@databricks.com
   wrote:
  
   Could you try to comment out some lines in
   `extract_sift_features_opencv` to find which line cause the
   crash?
  
   If the bytes came from sequenceFile() is broken, it's easy to
   crash a
   C library in Python (OpenCV).
  
   On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga
   sammiest...@gmail.com
   wrote:
Hi sparkers,
   
I am working on a PySpark application which uses the OpenCV
library. It
runs
fine when running the code locally but when I try to run it on
Spark on
the
same Machine it crashes the worker.
   
The code can be found here:
https://gist.github.com/samos123/885f9fe87c8fa5abf78f
   
This is the error message taken from STDERR of the worker log:
https://gist.github.com/samos123/3300191684aee7fc8013
   
Would like pointers or tips on how to debug further? Would be
nice
to
know
the reason why the worker crashed.
   
Thanks,
Sam Stoelinga
   
   
org.apache.spark.SparkException: Python worker exited
unexpectedly
(crashed)
at
   
   
org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
at
   
   
   
org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:176)
at
   

Re: PySpark with OpenCV causes python worker to crash

2015-06-01 Thread Davies Liu
Could you run the single thread version in worker machine to make sure
that OpenCV is installed and configured correctly?

On Sat, May 30, 2015 at 6:29 AM, Sam Stoelinga sammiest...@gmail.com wrote:
 I've verified the issue lies within Spark running OpenCV code and not within
 the sequence file BytesWritable formatting.

 This is the code which can reproduce that spark is causing the failure by
 not using the sequencefile as input at all but running the same function
 with same input on spark but fails:

 def extract_sift_features_opencv(imgfile_imgbytes):
 imgfilename, discardsequencefile = imgfile_imgbytes
 imgbytes = bytearray(open(/tmp/img.jpg, rb).read())
 nparr = np.fromstring(buffer(imgbytes), np.uint8)
 img = cv2.imdecode(nparr, 1)
 gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
 sift = cv2.xfeatures2d.SIFT_create()
 kp, descriptors = sift.detectAndCompute(gray, None)
 return (imgfilename, test)

 And corresponding tests.py:
 https://gist.github.com/samos123/d383c26f6d47d34d32d6


 On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga sammiest...@gmail.com
 wrote:

 Thanks for the advice! The following line causes spark to crash:

 kp, descriptors = sift.detectAndCompute(gray, None)

 But I do need this line to be executed and the code does not crash when
 running outside of Spark but passing the same parameters. You're saying
 maybe the bytes from the sequencefile got somehow transformed and don't
 represent an image anymore causing OpenCV to crash the whole python
 executor.

 On Fri, May 29, 2015 at 2:06 AM, Davies Liu dav...@databricks.com wrote:

 Could you try to comment out some lines in
 `extract_sift_features_opencv` to find which line cause the crash?

 If the bytes came from sequenceFile() is broken, it's easy to crash a
 C library in Python (OpenCV).

 On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga sammiest...@gmail.com
 wrote:
  Hi sparkers,
 
  I am working on a PySpark application which uses the OpenCV library. It
  runs
  fine when running the code locally but when I try to run it on Spark on
  the
  same Machine it crashes the worker.
 
  The code can be found here:
  https://gist.github.com/samos123/885f9fe87c8fa5abf78f
 
  This is the error message taken from STDERR of the worker log:
  https://gist.github.com/samos123/3300191684aee7fc8013
 
  Would like pointers or tips on how to debug further? Would be nice to
  know
  the reason why the worker crashed.
 
  Thanks,
  Sam Stoelinga
 
 
  org.apache.spark.SparkException: Python worker exited unexpectedly
  (crashed)
  at
  org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
  at
 
  org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:176)
  at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
  at org.apache.spark.scheduler.Task.run(Task.scala:64)
  at
  org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
  at
 
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  Caused by: java.io.EOFException
  at java.io.DataInputStream.readInt(DataInputStream.java:392)
  at
  org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
 
 
 




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: PySpark with OpenCV causes python worker to crash

2015-05-30 Thread Sam Stoelinga
Thanks for the advice! The following line causes spark to crash:

kp, descriptors = sift.detectAndCompute(gray, None)

But I do need this line to be executed and the code does not crash when
running outside of Spark but passing the same parameters. You're saying
maybe the bytes from the sequencefile got somehow transformed and don't
represent an image anymore causing OpenCV to crash the whole python
executor.

On Fri, May 29, 2015 at 2:06 AM, Davies Liu dav...@databricks.com wrote:

 Could you try to comment out some lines in
 `extract_sift_features_opencv` to find which line cause the crash?

 If the bytes came from sequenceFile() is broken, it's easy to crash a
 C library in Python (OpenCV).

 On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga sammiest...@gmail.com
 wrote:
  Hi sparkers,
 
  I am working on a PySpark application which uses the OpenCV library. It
 runs
  fine when running the code locally but when I try to run it on Spark on
 the
  same Machine it crashes the worker.
 
  The code can be found here:
  https://gist.github.com/samos123/885f9fe87c8fa5abf78f
 
  This is the error message taken from STDERR of the worker log:
  https://gist.github.com/samos123/3300191684aee7fc8013
 
  Would like pointers or tips on how to debug further? Would be nice to
 know
  the reason why the worker crashed.
 
  Thanks,
  Sam Stoelinga
 
 
  org.apache.spark.SparkException: Python worker exited unexpectedly
 (crashed)
  at
 org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
  at
  org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:176)
  at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
  at org.apache.spark.scheduler.Task.run(Task.scala:64)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
  at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  Caused by: java.io.EOFException
  at java.io.DataInputStream.readInt(DataInputStream.java:392)
  at
 org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
 
 
 



Re: PySpark with OpenCV causes python worker to crash

2015-05-30 Thread Sam Stoelinga
I've verified the issue lies within Spark running OpenCV code and not
within the sequence file BytesWritable formatting.

This is the code which can reproduce that spark is causing the failure by
not using the sequencefile as input at all but running the same function
with same input on spark but fails:

def extract_sift_features_opencv(imgfile_imgbytes):
imgfilename, discardsequencefile = imgfile_imgbytes
imgbytes = bytearray(open(/tmp/img.jpg, rb).read())
nparr = np.fromstring(buffer(imgbytes), np.uint8)
img = cv2.imdecode(nparr, 1)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
sift = cv2.xfeatures2d.SIFT_create()
kp, descriptors = sift.detectAndCompute(gray, None)
return (imgfilename, test)

And corresponding tests.py:
https://gist.github.com/samos123/d383c26f6d47d34d32d6


On Sat, May 30, 2015 at 8:04 PM, Sam Stoelinga sammiest...@gmail.com
wrote:

 Thanks for the advice! The following line causes spark to crash:

 kp, descriptors = sift.detectAndCompute(gray, None)

 But I do need this line to be executed and the code does not crash when
 running outside of Spark but passing the same parameters. You're saying
 maybe the bytes from the sequencefile got somehow transformed and don't
 represent an image anymore causing OpenCV to crash the whole python
 executor.

 On Fri, May 29, 2015 at 2:06 AM, Davies Liu dav...@databricks.com wrote:

 Could you try to comment out some lines in
 `extract_sift_features_opencv` to find which line cause the crash?

 If the bytes came from sequenceFile() is broken, it's easy to crash a
 C library in Python (OpenCV).

 On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga sammiest...@gmail.com
 wrote:
  Hi sparkers,
 
  I am working on a PySpark application which uses the OpenCV library. It
 runs
  fine when running the code locally but when I try to run it on Spark on
 the
  same Machine it crashes the worker.
 
  The code can be found here:
  https://gist.github.com/samos123/885f9fe87c8fa5abf78f
 
  This is the error message taken from STDERR of the worker log:
  https://gist.github.com/samos123/3300191684aee7fc8013
 
  Would like pointers or tips on how to debug further? Would be nice to
 know
  the reason why the worker crashed.
 
  Thanks,
  Sam Stoelinga
 
 
  org.apache.spark.SparkException: Python worker exited unexpectedly
 (crashed)
  at
 org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
  at
 
 org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:176)
  at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
  at org.apache.spark.scheduler.Task.run(Task.scala:64)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
  at
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  Caused by: java.io.EOFException
  at java.io.DataInputStream.readInt(DataInputStream.java:392)
  at
 org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)
 
 
 





Re: PySpark with OpenCV causes python worker to crash

2015-05-28 Thread Davies Liu
Could you try to comment out some lines in
`extract_sift_features_opencv` to find which line cause the crash?

If the bytes came from sequenceFile() is broken, it's easy to crash a
C library in Python (OpenCV).

On Thu, May 28, 2015 at 8:33 AM, Sam Stoelinga sammiest...@gmail.com wrote:
 Hi sparkers,

 I am working on a PySpark application which uses the OpenCV library. It runs
 fine when running the code locally but when I try to run it on Spark on the
 same Machine it crashes the worker.

 The code can be found here:
 https://gist.github.com/samos123/885f9fe87c8fa5abf78f

 This is the error message taken from STDERR of the worker log:
 https://gist.github.com/samos123/3300191684aee7fc8013

 Would like pointers or tips on how to debug further? Would be nice to know
 the reason why the worker crashed.

 Thanks,
 Sam Stoelinga


 org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
 at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:172)
 at
 org.apache.spark.api.python.PythonRDD$$anon$1.init(PythonRDD.scala:176)
 at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:94)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 at org.apache.spark.scheduler.Task.run(Task.scala:64)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:392)
 at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:108)




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org