Re: Calling HTTP Rest APIs from Spark Job

Jerry Vinokurov Thu, 14 May 2020 14:31:26 -0700

I believe that if you do this within the context of an operation that is
already parallelized such as a map, the work will be distributed to
executors and they will do it in parallel. I could be wrong about this as I
never investigated this specific use case, though.


On Thu, May 14, 2020 at 5:24 PM Chetan Khatri <chetan.opensou...@gmail.com>
wrote:

> Thanks for the quick response.
>
> I am curious to know whether would it be parallel pulling data for 100+
> HTTP request or it will only go on Driver node? the post body would be part
> of DataFrame. Think as I have a data frame of employee_id, employee_name
> now the http GET call has to be made for each employee_id and DataFrame is
> dynamic for each spark job run.
>
> Does it make sense?
>
> Thanks
>
>
> On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov <grapesmo...@gmail.com>
> wrote:
>
>> Hi Chetan,
>>
>> You can pretty much use any client to do this. When I was using Spark at
>> a previous job, we used OkHttp, but I'm sure there are plenty of others. In
>> our case, we had a startup phase in which we gathered metadata via a REST
>> API and then broadcast it to the workers. I think if you need all the
>> workers to have access to whatever you're getting from the API, that's the
>> way to do it.
>>
>> Jerry
>>
>> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Hi Spark Users,
>>>
>>> How can I invoke the Rest API call from Spark Code which is not only
>>> running on Spark Driver but distributed / parallel?
>>>
>>> Spark with Scala is my tech stack.
>>>
>>> Thanks
>>>
>>>
>>>
>>
>> --
>> http://www.google.com/profiles/grapesmoker
>>
>

-- 
http://www.google.com/profiles/grapesmoker

Re: Calling HTTP Rest APIs from Spark Job

Reply via email to