[GitHub] [spark] HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0

GitBox Wed, 24 Jul 2019 19:22:48 -0700

HyukjinKwon commented on a change in pull request #25245: 
[SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0
URL: https://github.com/apache/spark/pull/25245#discussion_r307091759


 ##########
 File path: python/pyspark/ml/image.py
 ##########
 @@ -203,52 +205,16 @@ def toImage(self, array, origin=""):
         return _create_row(self.imageFields,
                            [origin, height, width, nChannels, mode, data])
 
-    def readImages(self, path, recursive=False, numPartitions=-1,
-                   dropImageFailures=False, sampleRatio=1.0, seed=0):
-        """
-        Reads the directory of images from the local or remote source.
-
-        .. note:: If multiple jobs are run in parallel with different 
sampleRatio or recursive flag,
-            there may be a race condition where one job overwrites the hadoop 
configs of another.
-
-        .. note:: If sample ratio is less than 1, sampling uses a PathFilter 
that is efficient but
-            potentially non-deterministic.
-
-        .. note:: Deprecated in 2.4.0. Use 
`spark.read.format("image").load(path)` instead and
-            this `readImages` will be removed in 3.0.0.
-
-        :param str path: Path to the image directory.
-        :param bool recursive: Recursive search flag.
-        :param int numPartitions: Number of DataFrame partitions.
-        :param bool dropImageFailures: Drop the files that are not valid 
images.
-        :param float sampleRatio: Fraction of the images loaded.
-        :param int seed: Random number seed.
-        :return: a :class:`DataFrame` with a single column of "images",
-               see ImageSchema for details.
-
-        >>> df = ImageSchema.readImages('data/mllib/images/origin/kittens', 
recursive=True)
-        >>> df.count()
-        5
-
-        .. versionadded:: 2.3.0
-        """
-        warnings.warn("`ImageSchema.readImage` is deprecated. " +
-                      "Use `spark.read.format(\"image\").load(path)` 
instead.", DeprecationWarning)
-        spark = SparkSession.builder.getOrCreate()
-        image_schema = spark._jvm.org.apache.spark.ml.image.ImageSchema
-        jsession = spark._jsparkSession
-        jresult = image_schema.readImages(path, jsession, recursive, 
numPartitions,
-                                          dropImageFailures, 
float(sampleRatio), seed)
-        return DataFrame(jresult, spark._wrapped)
 
-
-ImageSchema = _ImageSchema()
+ImageUtils = _ImageUtils()
 
 Review comment:
   Are we going to expose those utils as APIs for PySpark specifically or not? 
If we're going to keep them, we should keep the original name `ImageSchema` for 
backward compatibility.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25245: [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0

Reply via email to