These are quite different creatures. You have a distributed set of
Strings, but want a local stream of bytes, which involves three
conversions:

- collect data to driver
- concatenate strings in some way
- encode strings as bytes according to an encoding

Your approach is OK but might be faster to avoid disk, if you have
enough memory:

- collect() to a Array[String] locally
- use Guava utilities to turn a bunch of Strings into a Reader
- Use the Apache Commons ReaderInputStream to read it as encoded bytes

I might wonder if that's all really what you want to do though.


On Fri, Mar 13, 2015 at 9:54 AM, Ayoub <benali.ayoub.i...@gmail.com> wrote:
> Hello,
>
> I need to convert an RDD[String] to a java.io.InputStream but I didn't find
> an east way to do it.
> Currently I am saving the RDD as temporary file and then opening an
> inputstream on the file but that is not really optimal.
>
> Does anybody know a better way to do that ?
>
> Thanks,
> Ayoub.
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-InputStream-tp22031.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to