Re: Write access to the Hive wiki

2016-09-01 Thread Lefty Leverenz
Done. Welcome to the Hive wiki team, Marta! -- Lefty On Thu, Sep 1, 2016 at 8:13 AM, Marta Kuczora wrote: > Sorry, forgot to write my loginId: kuczoram > > Regards, > Marta > > On Thu, Sep 1, 2016 at 2:12 PM, Marta Kuczora > wrote: > >> Hi, >>

Re: Quota for rogue ad-hoc queries

2016-09-01 Thread Gopal Vijayaraghavan
> Are there any other ways? Are you running Tez? Tez heartbeats counters back to the AppMaster every few seconds, so the AppMaster has an accurate (but delayed) count of HDFS_BYTES_WRITTEN. Cheers, Gopal

Re: Hive update operation

2016-09-01 Thread Db-Blog
Hi Mich, Nice explanation! The Update operation in hive work on row by row or it is performed in batches? We also observed multiple temp files getting generated in hdfs while performing the update operation. It will be really helpful if you can share details what hive does in the

What's the best way to detect and remove outliers in a table?

2016-09-01 Thread Mobius ReX
Given a table with hundreds of columns mixed with both categorical and numerical attributes, and the distribution of values is unknown, what's the best way to detect outliers? For example, given a table Category Price A 1 A 1.3 A 100 C

Re: Beeline history problem

2016-09-01 Thread Vihang Karajgaonkar
**removing dev list so as to avoid duplication of thread. This issue was reported and fixed in https://issues.apache.org/jira/browse/HIVE-14153 Which version are you using? Can you check if you have this fix in your build? > On Aug 31, 2016,

Re: Quota for rogue ad-hoc queries

2016-09-01 Thread Edward Capriolo
I have written nagios scripts that watch the job tracker UI and report when things take too long. On Thu, Sep 1, 2016 at 11:08 AM, Loïc Chanel wrote: > On the topic of timeout, if I may say, they are a dangerous way to deal > with requests as a "good" request may

Re: Quota for rogue ad-hoc queries

2016-09-01 Thread Loïc Chanel
On the topic of timeout, if I may say, they are a dangerous way to deal with requests as a "good" request may last longer than an "evil" one. Be sure timeouts won't kill any important job before putting them into place. You can set these things on in the components (Tez, MapReduce ...) parameters,

Re: Beeline throws OOM on large input query

2016-09-01 Thread Stephen Sprague
lemme guess. your query contains an 'in' clause with 1 million static values? :) * brute force solution is to set: HADOOP_CLIENT_OPTS=-Xmx8G (or whatever) before you run beeline to force a larger memory size (i'm pretty sure beeline uses that env var though i didn't actually check the

Re: Quota for rogue ad-hoc queries

2016-09-01 Thread Stephen Sprague
> rogue queries so this really isn't limited to just hive is it? any dbms system perhaps has to contend with this. even malicious rogue queries as a matter of fact. timeouts are cheap way systems handle this - assuming time is related to resource. i'm sure beeline or whatever client you use

Beeline throws OOM on large input query

2016-09-01 Thread Adam
Hive Version: 2.1.0 I have a very large, multi-line input query (8,668,519 chars) and I have gone up to 16g heap and still get the same OOM. Error: Error running query: java.lang.OutOfMemoryError: Java heap space (state=,code=0) org.apache.hive.service.cli.HiveSQLException: Error running query:

Re: Write access to the Hive wiki

2016-09-01 Thread Marta Kuczora
Sorry, forgot to write my loginId: kuczoram Regards, Marta On Thu, Sep 1, 2016 at 2:12 PM, Marta Kuczora wrote: > Hi, > > could somebody please give me right to modify the Hive wiki? > I would be interested in taking the HIVE-14632 Jira. I worked on the > output format

Write access to the Hive wiki

2016-09-01 Thread Marta Kuczora
Hi, could somebody please give me right to modify the Hive wiki? I would be interested in taking the HIVE-14632 Jira. I worked on the output format part recently, so I could improve its documentation. Thanks and regards, Marta