Re: dataframe slow down with tungsten turn on

2015-11-04 Thread gen tang
Yes, the same code, the same result.
In fact, the code has been running for a more one month. Before 1.5.0, the
performance is quite the same, So I doubt that it is causd by tungsten.

Gen

On Wed, Nov 4, 2015 at 4:05 PM, Rick Moritz  wrote:

> Something to check (just in case):
> Are you getting identical results each time?
>
> On Wed, Nov 4, 2015 at 8:54 AM, gen tang  wrote:
>
>> Hi sparkers,
>>
>> I am using dataframe to do some large ETL jobs.
>> More precisely, I create dataframe from HIVE table and do some
>> operations. And then I save it as json.
>>
>> When I used spark-1.4.1, the whole process is quite fast, about 1 mins.
>> However, when I use the same code with spark-1.5.1(with tungsten turn on),
>> it takes a about 2 hours to finish the same job.
>>
>> I checked the detail of tasks, almost all the time is consumed by
>> computation.
>>
>> Any idea about why this happens?
>>
>> Thanks a lot in advance for your help.
>>
>> Cheers
>> Gen
>>
>>
>


Re: dataframe slow down with tungsten turn on

2015-11-04 Thread Rick Moritz
Something to check (just in case):
Are you getting identical results each time?

On Wed, Nov 4, 2015 at 8:54 AM, gen tang  wrote:

> Hi sparkers,
>
> I am using dataframe to do some large ETL jobs.
> More precisely, I create dataframe from HIVE table and do some operations.
> And then I save it as json.
>
> When I used spark-1.4.1, the whole process is quite fast, about 1 mins.
> However, when I use the same code with spark-1.5.1(with tungsten turn on),
> it takes a about 2 hours to finish the same job.
>
> I checked the detail of tasks, almost all the time is consumed by
> computation.
>
> Any idea about why this happens?
>
> Thanks a lot in advance for your help.
>
> Cheers
> Gen
>
>


dataframe slow down with tungsten turn on

2015-11-03 Thread gen tang
Hi sparkers,

I am using dataframe to do some large ETL jobs.
More precisely, I create dataframe from HIVE table and do some operations.
And then I save it as json.

When I used spark-1.4.1, the whole process is quite fast, about 1 mins.
However, when I use the same code with spark-1.5.1(with tungsten turn on),
it takes a about 2 hours to finish the same job.

I checked the detail of tasks, almost all the time is consumed by
computation.

Any idea about why this happens?

Thanks a lot in advance for your help.

Cheers
Gen