RE: read image or binary files / spark 2.3

2019-09-05 Thread Ilya Matiach
Hi Peter,
You can use the spark.readImages API in spark 2.3 for reading images:

https://databricks.com/blog/2018/12/10/introducing-built-in-image-data-source-in-apache-spark-2-4.html
https://blogs.technet.microsoft.com/machinelearning/2018/03/05/image-data-support-in-apache-spark/

https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.ml.image.ImageSchema$

There’s also a spark package for spark versions older than 2.3:
https://github.com/Microsoft/spark-images

Thank you, Ilya




From: Peter Liu 
Sent: Thursday, September 5, 2019 2:13 PM
To: dev ; User 
Subject: Re: read image or binary files / spark 2.3

Hello experts,

I have quick question: which API allows me to read images files or binary files 
(for SparkSession.readStream) from a local/hadoop file system in Spark 2.3?

I have been browsing the following documentations and googling for it and 
didn't find a good example/documentation:

https://spark.apache.org/docs/2.3.0/streaming-programming-guide.html<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2F2.3.0%2Fstreaming-programming-guide.html=02%7C01%7Cilmat%40microsoft.com%7Cad36f2af52aa4cc906d908d7322cc4e1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637033040182027177=vYJ%2Ftor22teIlzMGMfqvsiQn5D6iFHcf4u0N2K2dkmc%3D=0>
https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.package<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2F2.3.0%2Fapi%2Fscala%2Findex.html%23org.apache.spark.package=02%7C01%7Cilmat%40microsoft.com%7Cad36f2af52aa4cc906d908d7322cc4e1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C637033040182037172=HeP0Bxk6eLdCk71uH7wcCxHwIM%2FCjbhzoQaiZgs0Gi0%3D=0>

any hint/help would be very much appreciated!

thanks!

Peter


Re: read image or binary files / spark 2.3

2019-09-05 Thread Peter Liu
Hello experts,

I have quick question: which API allows me to read images files or binary
files (for SparkSession.readStream) from a local/hadoop file system in
Spark 2.3?

I have been browsing the following documentations and googling for it and
didn't find a good example/documentation:

https://spark.apache.org/docs/2.3.0/streaming-programming-guide.html
https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.package

any hint/help would be very much appreciated!

thanks!

Peter