Hi, I have a user case:
I want to download S&P500 stock data from Yahoo API in parallel using Spark. I have got all stock symbols as a Dataset. Then I used below code to call Yahoo API for each symbol: case class Symbol(symbol: String, sector: String) case class Tick(symbol: String, sector: String, open: Double, close: Double) // symbolDS is Dataset[Symbol], pullSymbolFromYahoo returns Dataset[Tick] symbolDs.map { k => pullSymbolFromYahoo(k.symbol, k.sector) } This statement cannot compile: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. My questions are: 1. As you can see, this scenario is not traditional dataset handling such as count, sql query... Instead, it is more like a UDF which apply random operation on each record. Is Spark good at handling such scenario? 2. Regarding the compilation error, any fix? I did not find a satisfactory solution online. Thanks for help!