Re: hive on spark job not start enough executors

2016-09-09 Thread 明浩 冯
All the parameters except spark.executor.instances are specified in 
spark-default.conf located in hive's conf folder.  So I think it's a yes.

I also checked on spark's web page when a hive on spark job is running, the 
parameters shown on the web page are exactly what I specified in the config 
file including spark.shuffle.service.enabled and 
spark.dynamicAllocation.enabled.


Should I specify a fixed executor.instances in the file? But it's not good for 
me.


By the way, the data source of my query is parquet files. In hive side I just 
created a external table from the parquet.



Thanks,

Minghao Feng


From: Mich Talebzadeh 
Sent: Friday, September 9, 2016 4:49:55 PM
To: user
Subject: Re: hive on spark job not start enough executors

when you start hive on spark do you set any parameters for the submitted job 
(or read them from init file)?

set spark.master=yarn;
set spark.deploy.mode=client;
set spark.executor.memory=3g;
set spark.driver.memory=3g;
set spark.executor.instances=2;
set spark.ui.port=;


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 9 September 2016 at 09:30, ?? ? 
mailto:qiuff...@hotmail.com>> wrote:

Hi there,


I encountered a problem that makes hive on spark with a very low performance.

I'm using spark 1.6.2 and hive 2.1.0, I specified


spark.shuffle.service.enabledtrue
spark.dynamicAllocation.enabled  true

in my spark-default.conf file (the file is in both spark and hive conf folder) 
to make spark job to get executors dynamically.
The configuration works correctly when I run spark jobs, but when I use hive on 
spark, it only started a few executors although there are more enough cores and 
memories to start more executors.
For example, for the same SQL query, if I run on sparkSQL, it can start more 
than 20 executors, but with hive on spark, only 3.

How can I improve the performance on hive on spark? Any suggestions please.

Thanks,
Minghao Feng




Re: hive on spark job not start enough executors

2016-09-09 Thread Mich Talebzadeh
when you start hive on spark do you set any parameters for the submitted
job (or read them from init file)?

set spark.master=yarn;
set spark.deploy.mode=client;
set spark.executor.memory=3g;
set spark.driver.memory=3g;
set spark.executor.instances=2;
set spark.ui.port=;

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 9 September 2016 at 09:30, ?? ?  wrote:

> Hi there,
>
>
> I encountered a problem that makes hive on spark with a very low
> performance.
>
> I'm using spark 1.6.2 and hive 2.1.0, I specified
>
>
> spark.shuffle.service.enabledtrue
> spark.dynamicAllocation.enabled  true
>
> in my spark-default.conf file (the file is in both spark and hive conf
> folder) to make spark job to get executors dynamically.
> The configuration works correctly when I run spark jobs, but when I use
> hive on spark, it only started a few executors although there are more
> enough cores and memories to start more executors.
> For example, for the same SQL query, if I run on sparkSQL, it can start
> more than 20 executors, but with hive on spark, only 3.
>
> How can I improve the performance on hive on spark? Any suggestions please.
>
> Thanks,
> Minghao Feng
>
>


hive on spark job not start enough executors

2016-09-09 Thread ?? ?
Hi there,


I encountered a problem that makes hive on spark with a very low performance.

I'm using spark 1.6.2 and hive 2.1.0, I specified


spark.shuffle.service.enabledtrue
spark.dynamicAllocation.enabled  true

in my spark-default.conf file (the file is in both spark and hive conf folder) 
to make spark job to get executors dynamically.
The configuration works correctly when I run spark jobs, but when I use hive on 
spark, it only started a few executors although there are more enough cores and 
memories to start more executors.
For example, for the same SQL query, if I run on sparkSQL, it can start more 
than 20 executors, but with hive on spark, only 3.

How can I improve the performance on hive on spark? Any suggestions please.

Thanks,
Minghao Feng



Re: Quota for rogue ad-hoc queries

2016-09-09 Thread ravi teja
Hi,

I am trying to add this feature in hive ( HIVE-11735
 ).
But hit a road block while setting the quota during session folder creation
as the quota can be only set by super user in HDFS.
Any thoughts how to avoid this issue?

Thanks,
Ravi

On Fri, Sep 2, 2016 at 2:35 PM, ravi teja  wrote:

> Hi Gopal,
>
> We are using MR not Tez.
> I feel since the adhoc queries data output size is something we can
> determine, rather than the time the job takes, I was wondering more from
> output size/number of rows quota.
>
> Thanks,
> Ravi
>
> On Fri, Sep 2, 2016 at 2:57 AM, Gopal Vijayaraghavan 
> wrote:
>
>>
>> > Are there any other ways?
>>
>> Are you running Tez?
>>
>> Tez heartbeats counters back to the AppMaster every few seconds, so the
>> AppMaster has an accurate (but delayed) count of HDFS_BYTES_WRITTEN.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>>
>>
>>
>>
>>
>


Re: load data Failed with exception java.lang.IndexOutOfBoundsException

2016-09-09 Thread Prasanth Jayachandran
You are hitting this issue https://issues.apache.org/jira/browse/HIVE-13185 
which is fixed in latest hive release (2.1.0)

Thanks
Prasanth

> On Sep 9, 2016, at 2:21 AM, Gopal Vijayaraghavan  wrote:
> 
> 
>> It will be ok if the file has more than two characters,that is a little 
>> interesting. I can not understand the result of function checkInputFormat is 
>> OrcInputFormat,maybe that is just right.
> 
> My guess is that it is trying to read the 3 letter string "ORC" from that 
> file and failing.
> 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L471
> 
> Cheers,
> Gopal
> 
> 
> 
> 
> 



Re: load data Failed with exception java.lang.IndexOutOfBoundsException

2016-09-09 Thread Gopal Vijayaraghavan

> It will be ok if the file has more than two characters,that is a little 
> interesting. I can not understand the result of function checkInputFormat is 
> OrcInputFormat,maybe that is just right.

My guess is that it is trying to read the 3 letter string "ORC" from that file 
and failing.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L471

Cheers,
Gopal