Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22675#discussion_r226164264
  
    --- Diff: docs/ml-datasource.md ---
    @@ -0,0 +1,90 @@
    +---
    +layout: global
    +title: Data sources
    +displayTitle: Data sources
    +---
    +
    +In this section, we introduce how to use data source in ML to load data.
    +Beside some general data sources like Parquet, CSV, JSON and JDBC, we also 
provide some specific data source for ML.
    +
    +**Table of Contents**
    +
    +* This will become a table of contents (this text will be scraped).
    +{:toc}
    +
    +## Image data source
    +
    +This image data source is used to load image files from a directory, it 
can load compressed image (jpeg, png, etc.) into raw image representation via 
ImageIO in Java library.
    +The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
    +The schema of the `image` column is:
    + - origin: String (represents the file path of the image)
    --- End diff --
    
    I would use SQL types consistently, for instance, StringType, IntegerType


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to