Hi,

I have a user case:

I want to download S&P500 stock data from Yahoo API in parallel using
Spark. I have got all stock symbols as a Dataset. Then I used below code to
call Yahoo API for each symbol:



case class Symbol(symbol: String, sector: String)

case class Tick(symbol: String, sector: String, open: Double, close: Double)


// symbolDS is Dataset[Symbol], pullSymbolFromYahoo returns Dataset[Tick]


    symbolDs.map { k =>

      pullSymbolFromYahoo(k.symbol, k.sector)

    }


This statement cannot compile:


Unable to find encoder for type stored in a Dataset.  Primitive types (Int,
String, etc) and Product types (case classes) are supported by importing
spark.implicits._  Support for serializing other types will be added in
future releases.


My questions are:


1. As you can see, this scenario is not traditional dataset handling such
as count, sql query... Instead, it is more like a UDF which apply random
operation on each record. Is Spark good at handling such scenario?


2. Regarding the compilation error, any fix? I did not find a satisfactory
solution online.


Thanks for help!

Reply via email to