[jira] [Commented] (SPARK-22666) Spark datasource for image format

Jayesh lalwani (JIRA) Fri, 22 Jun 2018 11:25:06 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-22666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520657#comment-16520657
 ]


Jayesh lalwani commented on SPARK-22666:
----------------------------------------

I'll try to take this on

> Spark datasource for image format
> ---------------------------------
>
>                 Key: SPARK-22666
>                 URL: https://issues.apache.org/jira/browse/SPARK-22666
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Timothy Hunter
>            Priority: Major
>
> The current API for the new image format is implemented as a standalone 
> feature, in order to make it reside within the mllib package. As discussed in 
> SPARK-21866, users should be able to load images through the more common 
> spark source reader interface.
> This ticket is concerned with adding image reading support in the spark 
> source API, through either of the following interfaces:
>  - {{spark.read.format("image")...}}
>  - {{spark.read.image....}}
> The output is a dataframe that contains images (and the file names for 
> example), following the semantics discussed already in SPARK-21866.
> A few technical notes:
> * since the functionality is implemented in {{mllib}}, calling this function 
> may fail at runtime if users have not imported the {{spark-mllib}} dependency
> * How to deal with very flat directories? It is common to have millions of 
> files in a single "directory" (like in S3), which seems to have caused some 
> issues to some users. If this issue is too complex to handle in this ticket, 
> it can be dealt with separately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22666) Spark datasource for image format

Reply via email to