[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22675


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-22 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r227214199
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,113 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources such as Parquet, CSV, JSON and JDBC, we 
also provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
--- End diff --

ah, ok, great


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-22 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r227193764
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,113 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources such as Parquet, CSV, JSON and JDBC, we 
also provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
--- End diff --

Yes. This keep the same with other ML algo page.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-22 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r227062660
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,113 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources such as Parquet, CSV, JSON and JDBC, we 
also provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory, it 
can load compressed image (jpeg, png, etc.) into raw image representation via 
`ImageIO` in Java library.
+The loaded DataFrame has one `StructType` column: "image", containing 
image data stored as image schema.
+The schema of the `image` column is:
+ - origin: `StringType` (represents the file path of the image)
+ - height: `IntegerType` (height of the image)
+ - width: `IntegerType` (width of the image)
+ - nChannels: `IntegerType` (number of the image channels)
--- End diff --

minor: number of image channels (no "the")


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-22 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r227062385
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,113 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources such as Parquet, CSV, JSON and JDBC, we 
also provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
--- End diff --

is this convention, to have this text here in the table of contents? "* 
This will become a table of contents (this text will be scraped)."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-22 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r227062204
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,113 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources such as Parquet, CSV, JSON and JDBC, we 
also provide some specific data source for ML.
--- End diff --

"some specific data sources for ML"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-18 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226379993
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
--- End diff --

Data sources.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226165068
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON and JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory, it 
can load compressed image (jpeg, png, etc.) into raw image representation via 
ImageIO in Java library.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

Shall we consistently make some codes such as `StructType` as codes like `` 
`StructType` ``


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226164813
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
--- End diff --

Should it be `Datasource` or `Data sources`? I am saying this because there 
looks a mismatch with the menu above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226164867
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON and JDBC, we also 
provide some specific data source for ML.
--- End diff --

really personal preference tho .. `like` -> `such as`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226164476
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON and JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory, it 
can load compressed image (jpeg, png, etc.) into raw image representation via 
ImageIO in Java library.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

`.` -> `,`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226164264
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,90 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON and JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory, it 
can load compressed image (jpeg, png, etc.) into raw image representation via 
ImageIO in Java library.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+The schema of the `image` column is:
+ - origin: String (represents the file path of the image)
--- End diff --

I would use SQL types consistently, for instance, StringType, IntegerType


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226163959
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

Where's describing each field?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226163638
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements a Spark SQL data source API for loading image data as a 
DataFrame.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
+implements Spark SQL data source API for loading image data as DataFrame.
+
+{% highlight java %}
+Dataset imagesDF = 
spark.read().format("image").load("data/mllib/images/origin");
+{% endhighlight %}
+
+
+
--- End diff --

Shall we add an example for R as well then? It wouldn't be too difficult to 
add the equivalent examples. Also, I don't think we will add the equivalent 
examples in different languages at different pages.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226161636
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226161623
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-17 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r226161557
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements a Spark SQL data source API for loading image data as a 
DataFrame.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
+implements Spark SQL data source API for loading image data as DataFrame.
+
+{% highlight java %}
+Dataset imagesDF = 
spark.read().format("image").load("data/mllib/images/origin");
+{% endhighlight %}
+
+
+
--- End diff --

This looks like SQL features and fit all datasources. Put it in spark SQL 
doc will be better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r224322470
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements a Spark SQL data source API for loading image data as a 
DataFrame.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
+implements Spark SQL data source API for loading image data as DataFrame.
+
+{% highlight java %}
+Dataset imagesDF = 
spark.read().format("image").load("data/mllib/images/origin");
--- End diff --

Can we do a simple transformation so that how the image datasource can be 
utilized?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r224322298
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements a Spark SQL data source API for loading image data as a 
DataFrame.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
+implements Spark SQL data source API for loading image data as DataFrame.
+
+{% highlight java %}
+Dataset imagesDF = 
spark.read().format("image").load("data/mllib/images/origin");
+{% endhighlight %}
+
+
+
--- End diff --

how about SQL syntax? I think we can use `CREATE TABLE tableA USING 
LOCATION 'data/image.png'`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r224321873
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

Shall we describe which image we can load? For instance, I think this 
delegates to ImageIO in Java which allows to read compressed format like PNG or 
JPG to raw image representation like BMP so that OpenCS can handles them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r224321949
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load image files from a directory.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
--- End diff --

I would also describe the schema structure and what each field means.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r224321446
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,49 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources like Parquet, CSV, JSON, JDBC, we also 
provide some specific data source for ML.
--- End diff --

`JSON, JDBC` -> `JSON and JDBC`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223728031
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
--- End diff --

I sympathize with the comment, but I think it makes some sense tucked into 
ML rather than a standalone module.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223727369
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
--- End diff --

cc @cloud-fan and @gatorsmile, am I missing something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223725196
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
--- End diff --

I meant (external) Avro was merged into `external/...` in Apache Spark as a 
separate module due to the reason above. Image data source is merged into 
Spark's main code rather then a separate module. I don't object to bring an 
external into Apache Spark and I don't doubt you guys's judgement; however, was 
wondering why this exists in Spark's main code whereas the ideal approach is to 
put them `external/...`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223720371
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
--- End diff --

implements Spark -> implements a Spark
as DataFrame -> as a DataFrame
This sentence is repeated three times. Can you move the shared text out of 
the language-specific code blocks?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223720808
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+{% highlight java %}
+Dataset imagesDF = 
spark.read().format("image").load("data/mllib/images/origin");
+{% endhighlight %}
+
+
+
+In scala we implement Spark SQL data source API for loading image data as 
DataFrame.
--- End diff --

scala -> Scala, but this is about Python.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223720105
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
--- End diff --

from directory -> from a directory
Maybe say "This data source loads images in libsvm format from a directory"?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223719759
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
--- End diff --

- data sources -> data sources like
- Do you want to just say "Parquet, CSV, JSON, and JDBC"? they aren't code 
identifiers here
- parquat -> parquet
- data source -> data sources


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-08 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223566032
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
--- End diff --

Usually it depends on how important the use case is. For example, CSV was 
created as an external data source and later merged into Spark. See 
https://issues.apache.org/jira/browse/SPARK-21866?focusedCommentId=16148268=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16148268.
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223552437
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
--- End diff --

cc @mengxr as well


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22675#discussion_r223552386
  
--- Diff: docs/ml-datasource.md ---
@@ -0,0 +1,51 @@
+---
+layout: global
+title: Data sources
+displayTitle: Data sources
+---
+
+In this section, we introduce how to use data source in ML to load data.
+Beside some general data sources "parquat", "csv", "json", "jdbc", we also 
provide some specific data source for ML.
+
+**Table of Contents**
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+## Image data source
+
+This image data source is used to load libsvm data files from directory.
+
+
+

+[`ImageDataSource`](api/scala/index.html#org.apache.spark.ml.source.image.ImageDataSource)
+implements Spark SQL data source API for loading image data as DataFrame.
+The loaded DataFrame has one StructType column: "image". containing image 
data stored as image schema.
+
+{% highlight scala %}
+scala> spark.read.format("image").load("data/mllib/images/origin")
+res1: org.apache.spark.sql.DataFrame = [image: struct]
+{% endhighlight %}
+
+
+

+[`ImageDataSource`](api/java/org/apache/spark/ml/source/image/ImageDataSource.html)
--- End diff --

Out of curiosity, why did we put the image source inside of Spark, rather 
then a separate module? (see also 
https://github.com/apache/spark/pull/21742#discussion_r201552008). Avro was put 
as a separate module.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22675: [SPARK-25347][ML][DOC] Spark datasource for image...

2018-10-08 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/22675

[SPARK-25347][ML][DOC] Spark datasource for image/libsvm user guide

## What changes were proposed in this pull request?

Spark datasource for image/libsvm user guide

## How was this patch tested?
Scala:

![8ba0b2c3-7a44-4154-8ecf-d001f94d1ca0](https://user-images.githubusercontent.com/19235986/46644290-42cabc00-cbb2-11e8-978e-996f12ef9405.png)
Java:

![4c6ffe71-1268-4890-886f-263f61f25519](https://user-images.githubusercontent.com/19235986/46644298-49f1ca00-cbb2-11e8-9af9-17409a8d29e2.png)
Python:

![2ec95d0f-478b-4bd6-8815-16640bbea5ab](https://user-images.githubusercontent.com/19235986/46644301-4fe7ab00-cbb2-11e8-989f-1280582f5358.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark add_image_source_doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22675.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22675


commit 887f5282fba8a8a0bcbb9242eb87b27bf94d0210
Author: WeichenXu 
Date:   2018-10-09T02:54:09Z

init




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org