Re: foreachPartition's operation is taking long to finish

2016-12-17 Thread Deepak Sharma
On Sun, Dec 18, 2016 at 2:26 AM, vaquar khan  wrote:

> select * from indexInfo;
>

Hi Vaquar
I do not see CF with the name indexInfo in any of the cassandra databases.

Thank
Deepak


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net


Re: foreachPartition's operation is taking long to finish

2016-12-17 Thread Deepak Sharma
There are 8 worker nodes in the cluster .

Thanks
Deepak

On Dec 18, 2016 2:15 AM, "Holden Karau"  wrote:

> How many workers are in the cluster?
>
> On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma 
> wrote:
>
>> Hi All,
>> I am iterating over data frame's paritions using df.foreachPartition .
>> Upon each iteration of row , i am initializing DAO to insert the row into
>> cassandra.
>> Each of these iteration takes almost 1 and half minute to finish.
>> In my workflow , this is part of an action and 100 partitions are being
>> created for the df as i can see 100 tasks being created , where the insert
>> dao operation is being performed.
>> Since each of these 100 tasks , takes around 1 and half minute to
>> complete , it takes around 2 hour for this small insert operation.
>> Is anyone facing the same scenario and is there any time efficient way to
>> handle this?
>> This latency is not good in out use case.
>> Any pointer to improve/minimise the latency will be really appreciated.
>>
>>
>> --
>> Thanks
>> Deepak
>>
>>
>>


Re: foreachPartition's operation is taking long to finish

2016-12-17 Thread vaquar khan
Hi Deepak,

Could you share Index information in your database.

select * from indexInfo;


Regards,
Vaquar khan

On Sat, Dec 17, 2016 at 2:45 PM, Holden Karau  wrote:

> How many workers are in the cluster?
>
> On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma 
> wrote:
>
>> Hi All,
>> I am iterating over data frame's paritions using df.foreachPartition .
>> Upon each iteration of row , i am initializing DAO to insert the row into
>> cassandra.
>> Each of these iteration takes almost 1 and half minute to finish.
>> In my workflow , this is part of an action and 100 partitions are being
>> created for the df as i can see 100 tasks being created , where the insert
>> dao operation is being performed.
>> Since each of these 100 tasks , takes around 1 and half minute to
>> complete , it takes around 2 hour for this small insert operation.
>> Is anyone facing the same scenario and is there any time efficient way to
>> handle this?
>> This latency is not good in out use case.
>> Any pointer to improve/minimise the latency will be really appreciated.
>>
>>
>> --
>> Thanks
>> Deepak
>>
>>
>>


-- 
Regards,
Vaquar Khan
+1 -224-436-0783

IT Architect / Lead Consultant
Greater Chicago


Re: foreachPartition's operation is taking long to finish

2016-12-17 Thread Holden Karau
How many workers are in the cluster?

On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma 
wrote:

> Hi All,
> I am iterating over data frame's paritions using df.foreachPartition .
> Upon each iteration of row , i am initializing DAO to insert the row into
> cassandra.
> Each of these iteration takes almost 1 and half minute to finish.
> In my workflow , this is part of an action and 100 partitions are being
> created for the df as i can see 100 tasks being created , where the insert
> dao operation is being performed.
> Since each of these 100 tasks , takes around 1 and half minute to complete
> , it takes around 2 hour for this small insert operation.
> Is anyone facing the same scenario and is there any time efficient way to
> handle this?
> This latency is not good in out use case.
> Any pointer to improve/minimise the latency will be really appreciated.
>
>
> --
> Thanks
> Deepak
>
>
>


foreachPartition's operation is taking long to finish

2016-12-17 Thread Deepak Sharma
Hi All,
I am iterating over data frame's paritions using df.foreachPartition .
Upon each iteration of row , i am initializing DAO to insert the row into
cassandra.
Each of these iteration takes almost 1 and half minute to finish.
In my workflow , this is part of an action and 100 partitions are being
created for the df as i can see 100 tasks being created , where the insert
dao operation is being performed.
Since each of these 100 tasks , takes around 1 and half minute to complete
, it takes around 2 hour for this small insert operation.
Is anyone facing the same scenario and is there any time efficient way to
handle this?
This latency is not good in out use case.
Any pointer to improve/minimise the latency will be really appreciated.


-- 
Thanks
Deepak