Re: user threads in executors
Yes, look at KafkaUtils.createRDD On Wed, Jul 22, 2015 at 11:17 AM, Shushant Arora wrote: > Thanks ! > > I am using spark streaming 1.3 , And if some post fails because of any > reason, I will store the offset of that message in another kafka topic. I > want to read these offsets in another spark job and from them the original > kafka topic's messages based on these offsets- > So is it possible in spark job to get kafka messages based on random > offsets ? Or is there any better alternative to handle failure of post > request? > > On Wed, Jul 22, 2015 at 11:31 AM, Tathagata Das > wrote: > >> Yes, you could unroll from the iterator in batch of 100-200 and then post >> them in multiple rounds. >> If you are using the Kafka receiver based approach (not Direct), then the >> raw Kafka data is stored in the executor memory. If you are using Direct >> Kafka, then it is read from Kafka directly at the time of filtering. >> >> TD >> >> On Tue, Jul 21, 2015 at 9:34 PM, Shushant Arora < >> shushantaror...@gmail.com> wrote: >> >>> I can post multiple items at a time. >>> >>> Data is being read from kafka and filtered after that its posted . Does >>> foreachPartition >>> load complete partition in memory or use an iterator of batch underhood? If >>> compete batch is not loaded will using custim size of 100-200 request in >>> one batch and post will help instead of whole partition ? >>> >>> On Wed, Jul 22, 2015 at 12:18 AM, Tathagata Das >>> wrote: >>> >>>> If you can post multiple items at a time, then use foreachPartition to >>>> post the whole partition in a single request. >>>> >>>> On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher < >>>> rmarsc...@localytics.com> wrote: >>>> >>>>> You can certainly create threads in a map transformation. We do this >>>>> to do concurrent DB lookups during one stage for example. I would >>>>> recommend, however, that you switch to mapPartitions from map as this >>>>> allows you to create a fixed size thread pool to share across items on a >>>>> partition as opposed to spawning a future per record in the RDD for >>>>> example. >>>>> >>>>> On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora < >>>>> shushantaror...@gmail.com> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> Can I create user threads in executors. >>>>>> I have a streaming app where after processing I have a requirement to >>>>>> push events to external system . Each post request costs ~90-100 ms. >>>>>> >>>>>> To make post parllel, I can not use same thread because that is >>>>>> limited by no of cores available in system , can I useuser therads in >>>>>> spark >>>>>> App? I tried to create 2 thredas in a map tasks and it worked. >>>>>> >>>>>> Is there any upper limit on no of user threds in spark executor ? Is >>>>>> it a good idea to create user threads in spark map task? >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Richard Marscher* >>>>> Software Engineer >>>>> Localytics >>>>> Localytics.com <http://localytics.com/> | Our Blog >>>>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> >>>>> | Facebook <http://facebook.com/localytics> | LinkedIn >>>>> <http://www.linkedin.com/company/1148792?trk=tyah> >>>>> >>>> >>>> >>> >> >
Re: user threads in executors
Thanks ! I am using spark streaming 1.3 , And if some post fails because of any reason, I will store the offset of that message in another kafka topic. I want to read these offsets in another spark job and from them the original kafka topic's messages based on these offsets- So is it possible in spark job to get kafka messages based on random offsets ? Or is there any better alternative to handle failure of post request? On Wed, Jul 22, 2015 at 11:31 AM, Tathagata Das wrote: > Yes, you could unroll from the iterator in batch of 100-200 and then post > them in multiple rounds. > If you are using the Kafka receiver based approach (not Direct), then the > raw Kafka data is stored in the executor memory. If you are using Direct > Kafka, then it is read from Kafka directly at the time of filtering. > > TD > > On Tue, Jul 21, 2015 at 9:34 PM, Shushant Arora > wrote: > >> I can post multiple items at a time. >> >> Data is being read from kafka and filtered after that its posted . Does >> foreachPartition >> load complete partition in memory or use an iterator of batch underhood? If >> compete batch is not loaded will using custim size of 100-200 request in >> one batch and post will help instead of whole partition ? >> >> On Wed, Jul 22, 2015 at 12:18 AM, Tathagata Das >> wrote: >> >>> If you can post multiple items at a time, then use foreachPartition to >>> post the whole partition in a single request. >>> >>> On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher < >>> rmarsc...@localytics.com> wrote: >>> >>>> You can certainly create threads in a map transformation. We do this to >>>> do concurrent DB lookups during one stage for example. I would recommend, >>>> however, that you switch to mapPartitions from map as this allows you to >>>> create a fixed size thread pool to share across items on a partition as >>>> opposed to spawning a future per record in the RDD for example. >>>> >>>> On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora < >>>> shushantaror...@gmail.com> wrote: >>>> >>>>> Hi >>>>> >>>>> Can I create user threads in executors. >>>>> I have a streaming app where after processing I have a requirement to >>>>> push events to external system . Each post request costs ~90-100 ms. >>>>> >>>>> To make post parllel, I can not use same thread because that is >>>>> limited by no of cores available in system , can I useuser therads in >>>>> spark >>>>> App? I tried to create 2 thredas in a map tasks and it worked. >>>>> >>>>> Is there any upper limit on no of user threds in spark executor ? Is >>>>> it a good idea to create user threads in spark map task? >>>>> >>>>> Thanks >>>>> >>>>> >>>> >>>> >>>> -- >>>> *Richard Marscher* >>>> Software Engineer >>>> Localytics >>>> Localytics.com <http://localytics.com/> | Our Blog >>>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> >>>> | Facebook <http://facebook.com/localytics> | LinkedIn >>>> <http://www.linkedin.com/company/1148792?trk=tyah> >>>> >>> >>> >> >
Re: user threads in executors
Yes, you could unroll from the iterator in batch of 100-200 and then post them in multiple rounds. If you are using the Kafka receiver based approach (not Direct), then the raw Kafka data is stored in the executor memory. If you are using Direct Kafka, then it is read from Kafka directly at the time of filtering. TD On Tue, Jul 21, 2015 at 9:34 PM, Shushant Arora wrote: > I can post multiple items at a time. > > Data is being read from kafka and filtered after that its posted . Does > foreachPartition > load complete partition in memory or use an iterator of batch underhood? If > compete batch is not loaded will using custim size of 100-200 request in > one batch and post will help instead of whole partition ? > > On Wed, Jul 22, 2015 at 12:18 AM, Tathagata Das > wrote: > >> If you can post multiple items at a time, then use foreachPartition to >> post the whole partition in a single request. >> >> On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher < >> rmarsc...@localytics.com> wrote: >> >>> You can certainly create threads in a map transformation. We do this to >>> do concurrent DB lookups during one stage for example. I would recommend, >>> however, that you switch to mapPartitions from map as this allows you to >>> create a fixed size thread pool to share across items on a partition as >>> opposed to spawning a future per record in the RDD for example. >>> >>> On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora < >>> shushantaror...@gmail.com> wrote: >>> >>>> Hi >>>> >>>> Can I create user threads in executors. >>>> I have a streaming app where after processing I have a requirement to >>>> push events to external system . Each post request costs ~90-100 ms. >>>> >>>> To make post parllel, I can not use same thread because that is limited >>>> by no of cores available in system , can I useuser therads in spark App? I >>>> tried to create 2 thredas in a map tasks and it worked. >>>> >>>> Is there any upper limit on no of user threds in spark executor ? Is it >>>> a good idea to create user threads in spark map task? >>>> >>>> Thanks >>>> >>>> >>> >>> >>> -- >>> *Richard Marscher* >>> Software Engineer >>> Localytics >>> Localytics.com <http://localytics.com/> | Our Blog >>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> >>> | Facebook <http://facebook.com/localytics> | LinkedIn >>> <http://www.linkedin.com/company/1148792?trk=tyah> >>> >> >> >
Re: user threads in executors
I can post multiple items at a time. Data is being read from kafka and filtered after that its posted . Does foreachPartition load complete partition in memory or use an iterator of batch underhood? If compete batch is not loaded will using custim size of 100-200 request in one batch and post will help instead of whole partition ? On Wed, Jul 22, 2015 at 12:18 AM, Tathagata Das wrote: > If you can post multiple items at a time, then use foreachPartition to > post the whole partition in a single request. > > On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher < > rmarsc...@localytics.com> wrote: > >> You can certainly create threads in a map transformation. We do this to >> do concurrent DB lookups during one stage for example. I would recommend, >> however, that you switch to mapPartitions from map as this allows you to >> create a fixed size thread pool to share across items on a partition as >> opposed to spawning a future per record in the RDD for example. >> >> On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora < >> shushantaror...@gmail.com> wrote: >> >>> Hi >>> >>> Can I create user threads in executors. >>> I have a streaming app where after processing I have a requirement to >>> push events to external system . Each post request costs ~90-100 ms. >>> >>> To make post parllel, I can not use same thread because that is limited >>> by no of cores available in system , can I useuser therads in spark App? I >>> tried to create 2 thredas in a map tasks and it worked. >>> >>> Is there any upper limit on no of user threds in spark executor ? Is it >>> a good idea to create user threads in spark map task? >>> >>> Thanks >>> >>> >> >> >> -- >> *Richard Marscher* >> Software Engineer >> Localytics >> Localytics.com <http://localytics.com/> | Our Blog >> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> | >> Facebook <http://facebook.com/localytics> | LinkedIn >> <http://www.linkedin.com/company/1148792?trk=tyah> >> > >
Re: user threads in executors
If you can post multiple items at a time, then use foreachPartition to post the whole partition in a single request. On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher wrote: > You can certainly create threads in a map transformation. We do this to do > concurrent DB lookups during one stage for example. I would recommend, > however, that you switch to mapPartitions from map as this allows you to > create a fixed size thread pool to share across items on a partition as > opposed to spawning a future per record in the RDD for example. > > On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora > wrote: > >> Hi >> >> Can I create user threads in executors. >> I have a streaming app where after processing I have a requirement to >> push events to external system . Each post request costs ~90-100 ms. >> >> To make post parllel, I can not use same thread because that is limited >> by no of cores available in system , can I useuser therads in spark App? I >> tried to create 2 thredas in a map tasks and it worked. >> >> Is there any upper limit on no of user threds in spark executor ? Is it a >> good idea to create user threads in spark map task? >> >> Thanks >> >> > > > -- > *Richard Marscher* > Software Engineer > Localytics > Localytics.com <http://localytics.com/> | Our Blog > <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> | > Facebook <http://facebook.com/localytics> | LinkedIn > <http://www.linkedin.com/company/1148792?trk=tyah> >
Re: user threads in executors
You can certainly create threads in a map transformation. We do this to do concurrent DB lookups during one stage for example. I would recommend, however, that you switch to mapPartitions from map as this allows you to create a fixed size thread pool to share across items on a partition as opposed to spawning a future per record in the RDD for example. On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora wrote: > Hi > > Can I create user threads in executors. > I have a streaming app where after processing I have a requirement to push > events to external system . Each post request costs ~90-100 ms. > > To make post parllel, I can not use same thread because that is limited by > no of cores available in system , can I useuser therads in spark App? I > tried to create 2 thredas in a map tasks and it worked. > > Is there any upper limit on no of user threds in spark executor ? Is it a > good idea to create user threads in spark map task? > > Thanks > > -- *Richard Marscher* Software Engineer Localytics Localytics.com <http://localytics.com/> | Our Blog <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> | Facebook <http://facebook.com/localytics> | LinkedIn <http://www.linkedin.com/company/1148792?trk=tyah>
user threads in executors
Hi Can I create user threads in executors. I have a streaming app where after processing I have a requirement to push events to external system . Each post request costs ~90-100 ms. To make post parllel, I can not use same thread because that is limited by no of cores available in system , can I useuser therads in spark App? I tried to create 2 thredas in a map tasks and it worked. Is there any upper limit on no of user threds in spark executor ? Is it a good idea to create user threads in spark map task? Thanks