Do you only want to use Scala? Because otherwise, I think with pyspark
and pandas read table you should be able to accomplish what you want to
accomplish.

Thank you,

Irving Duran

On 02/16/2018 06:10 PM, Lian Jiang wrote:
> Hi,
>
> I have a user case:
>
> I want to download S&P500 stock data from Yahoo API in parallel using
> Spark. I have got all stock symbols as a Dataset. Then I used below
> code to call Yahoo API for each symbol:
>
>        
>
> case class Symbol(symbol: String, sector: String)
>
> case class Tick(symbol: String, sector: String, open: Double, close:
> Double)
>
>
> // symbolDS is Dataset[Symbol], pullSymbolFromYahoo returns Dataset[Tick]
>
>
>     symbolDs.map { k =>
>
>       pullSymbolFromYahoo(k.symbol, k.sector)
>
>     }
>
>
> This statement cannot compile:
>
>
> Unable to find encoder for type stored in a Dataset.  Primitive types
> (Int, String, etc) and Product types (case classes) are supported by
> importing spark.implicits._  Support for serializing other types will
> be added in future releases.
>
>
>
> My questions are:
>
>
> 1. As you can see, this scenario is not traditional dataset handling
> such as count, sql query... Instead, it is more like a UDF which apply
> random operation on each record. Is Spark good at handling such scenario?
>
>
> 2. Regarding the compilation error, any fix? I did not find a
> satisfactory solution online.
>
>
> Thanks for help!
>
>
>
>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to