It won't be very efficient but you could write a python UDF using PythonMagick - https://wiki.python.org/moin/ImageMagick
If you have PyArrow > 0.10 then you might be able to get a boost by saving images in a column as BinaryType and writing a PandasUDF. On Wed, Jul 31, 2019 at 6:22 AM Nick Dawes <nickdawe...@gmail.com> wrote: > Any other way of resizing the image before creating the DataFrame in > Spark? I know opencv does it. But I don't have opencv on my cluster. I have > Anaconda python packages installed on my cluster. > > Any ideas will be appreciated. Thank you! > > On Tue, Jul 30, 2019, 4:17 PM Nick Dawes <nickdawe...@gmail.com> wrote: > >> Hi >> >> I'm new to spark image data source. >> >> After creating a dataframe using Spark's image data source, I would like >> to resize the images in PySpark. >> >> df = spark.read.format("image").load(imageDir) >> >> Can you please help me with this? >> >> Nick >> > -- *Patrick McCarthy * Senior Data Scientist, Machine Learning Engineering Dstillery 470 Park Ave South, 17th Floor, NYC 10016