[jira] [Commented] (SPARK-30242) Support reading Parquet files from Stream Buffer

Hyukjin Kwon (Jira) Mon, 16 Dec 2019 17:50:05 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-30242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997779#comment-16997779
 ]


Hyukjin Kwon commented on SPARK-30242:
--------------------------------------

Nope, I don't think it will be able as it requires to change too many APIs 
(e.g, ORC, CSV, Json, Text) but it can be easily worked around by writing out 
to the local directory and read it back.

> Support reading Parquet files from Stream Buffer
> ------------------------------------------------
>
>                 Key: SPARK-30242
>                 URL: https://issues.apache.org/jira/browse/SPARK-30242
>             Project: Spark
>          Issue Type: Wish
>          Components: PySpark
>    Affects Versions: 3.0.0
>            Reporter: Jelther Oliveira Gonçalves
>            Priority: Trivial
>
> Reading from a Python BufferIO a parquet is not possible using Pyspark.
> Using:
>  
> {code:java}
> from io import BytesIO
> parquetbytes : Bytes = b'PAR...'
> df = spark.read.format("parquet").load(BytesIO(parquetbytes))
> {code}
> Raises :
> {code:java}
> java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
> java.lang.String{code}
>  
> Is there any chance this will be available in the future?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30242) Support reading Parquet files from Stream Buffer

Reply via email to