I believe that if you do this within the context of an operation that is already parallelized such as a map, the work will be distributed to executors and they will do it in parallel. I could be wrong about this as I never investigated this specific use case, though.
On Thu, May 14, 2020 at 5:24 PM Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Thanks for the quick response. > > I am curious to know whether would it be parallel pulling data for 100+ > HTTP request or it will only go on Driver node? the post body would be part > of DataFrame. Think as I have a data frame of employee_id, employee_name > now the http GET call has to be made for each employee_id and DataFrame is > dynamic for each spark job run. > > Does it make sense? > > Thanks > > > On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov <grapesmo...@gmail.com> > wrote: > >> Hi Chetan, >> >> You can pretty much use any client to do this. When I was using Spark at >> a previous job, we used OkHttp, but I'm sure there are plenty of others. In >> our case, we had a startup phase in which we gathered metadata via a REST >> API and then broadcast it to the workers. I think if you need all the >> workers to have access to whatever you're getting from the API, that's the >> way to do it. >> >> Jerry >> >> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri < >> chetan.opensou...@gmail.com> wrote: >> >>> Hi Spark Users, >>> >>> How can I invoke the Rest API call from Spark Code which is not only >>> running on Spark Driver but distributed / parallel? >>> >>> Spark with Scala is my tech stack. >>> >>> Thanks >>> >>> >>> >> >> -- >> http://www.google.com/profiles/grapesmoker >> > -- http://www.google.com/profiles/grapesmoker