Re: foreachPartition's operation is taking long to finish
On Sun, Dec 18, 2016 at 2:26 AM, vaquar khanwrote: > select * from indexInfo; > Hi Vaquar I do not see CF with the name indexInfo in any of the cassandra databases. Thank Deepak -- Thanks Deepak www.bigdatabig.com www.keosha.net
Re: foreachPartition's operation is taking long to finish
There are 8 worker nodes in the cluster . Thanks Deepak On Dec 18, 2016 2:15 AM, "Holden Karau"wrote: > How many workers are in the cluster? > > On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma > wrote: > >> Hi All, >> I am iterating over data frame's paritions using df.foreachPartition . >> Upon each iteration of row , i am initializing DAO to insert the row into >> cassandra. >> Each of these iteration takes almost 1 and half minute to finish. >> In my workflow , this is part of an action and 100 partitions are being >> created for the df as i can see 100 tasks being created , where the insert >> dao operation is being performed. >> Since each of these 100 tasks , takes around 1 and half minute to >> complete , it takes around 2 hour for this small insert operation. >> Is anyone facing the same scenario and is there any time efficient way to >> handle this? >> This latency is not good in out use case. >> Any pointer to improve/minimise the latency will be really appreciated. >> >> >> -- >> Thanks >> Deepak >> >> >>
Re: foreachPartition's operation is taking long to finish
Hi Deepak, Could you share Index information in your database. select * from indexInfo; Regards, Vaquar khan On Sat, Dec 17, 2016 at 2:45 PM, Holden Karauwrote: > How many workers are in the cluster? > > On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma > wrote: > >> Hi All, >> I am iterating over data frame's paritions using df.foreachPartition . >> Upon each iteration of row , i am initializing DAO to insert the row into >> cassandra. >> Each of these iteration takes almost 1 and half minute to finish. >> In my workflow , this is part of an action and 100 partitions are being >> created for the df as i can see 100 tasks being created , where the insert >> dao operation is being performed. >> Since each of these 100 tasks , takes around 1 and half minute to >> complete , it takes around 2 hour for this small insert operation. >> Is anyone facing the same scenario and is there any time efficient way to >> handle this? >> This latency is not good in out use case. >> Any pointer to improve/minimise the latency will be really appreciated. >> >> >> -- >> Thanks >> Deepak >> >> >> -- Regards, Vaquar Khan +1 -224-436-0783 IT Architect / Lead Consultant Greater Chicago
Re: foreachPartition's operation is taking long to finish
How many workers are in the cluster? On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharmawrote: > Hi All, > I am iterating over data frame's paritions using df.foreachPartition . > Upon each iteration of row , i am initializing DAO to insert the row into > cassandra. > Each of these iteration takes almost 1 and half minute to finish. > In my workflow , this is part of an action and 100 partitions are being > created for the df as i can see 100 tasks being created , where the insert > dao operation is being performed. > Since each of these 100 tasks , takes around 1 and half minute to complete > , it takes around 2 hour for this small insert operation. > Is anyone facing the same scenario and is there any time efficient way to > handle this? > This latency is not good in out use case. > Any pointer to improve/minimise the latency will be really appreciated. > > > -- > Thanks > Deepak > > >
foreachPartition's operation is taking long to finish
Hi All, I am iterating over data frame's paritions using df.foreachPartition . Upon each iteration of row , i am initializing DAO to insert the row into cassandra. Each of these iteration takes almost 1 and half minute to finish. In my workflow , this is part of an action and 100 partitions are being created for the df as i can see 100 tasks being created , where the insert dao operation is being performed. Since each of these 100 tasks , takes around 1 and half minute to complete , it takes around 2 hour for this small insert operation. Is anyone facing the same scenario and is there any time efficient way to handle this? This latency is not good in out use case. Any pointer to improve/minimise the latency will be really appreciated. -- Thanks Deepak