Re: Query Failures
https://community.cloudera.com/t5/Support-Questions/Map-and-Reduce-Error-Java-heap-space/td-p/45874 On Fri, Feb 14, 2020, 6:58 PM David Mollitor wrote: > Hive has many optimizations. One is that it will load the data directly > from storage (HDFS) if it's a trivial query. For example: > > Select * from table limit 10; > > In natural language it says "give me any ten rows (if available) from the > table." You don't need the overhead of launching a full mapreduce job for > this. Just read the rows from the file directly. > > Adding additional predicates on the query requires a mapreduce job to do > the heavy lifting. The error message you're getting is probably the result > of a failed mapreduce job. Nine times out of ten, the problem is that the > mappers/reducers are not granted enough memory for their YARN containers. > > On Tue, Feb 11, 2020, 10:41 AM Pau Tallada wrote: > >> Hi, >> >> Do you have more complete tracebacks? >> >> Missatge de Charles Givre del dia dt., 11 de febr. >> 2020 a les 2:54: >> >>> Hello Everyone! >>> I recently joined a project that has a Hive/Impala installation and we >>> are experience a significant number of query failures. We are using an >>> older version of Hive, and unfortunately there's nothing iI can do about >>> that, but I'm wondering is how I can make Hive do better with queries to >>> give our users a better experience. >>> >>> For example, I can execute a basic SELECT * query or SELECT >>> query without issues. >>> >>> However, if I attempt to: >>> 1. Add filters >>> 2. Do a SELECT DISTINCT >>> 3. Perform basic aggregation >>> >>> I get errors like this: Execution Error, return code 1 from >>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask. >>> >>> Could someone point me to some good guides for querying Hive and/or >>> assisting my engineers in preventing these errors? >>> Thanks, >>> >>> >> >> -- >> -- >> Pau Tallada Crespí >> Dep. d'Astrofísica i Cosmologia >> Port d'Informació Científica (PIC) >> Tel: +34 93 170 2729 >> -- >> >>
Re: Query Failures
Hive has many optimizations. One is that it will load the data directly from storage (HDFS) if it's a trivial query. For example: Select * from table limit 10; In natural language it says "give me any ten rows (if available) from the table." You don't need the overhead of launching a full mapreduce job for this. Just read the rows from the file directly. Adding additional predicates on the query requires a mapreduce job to do the heavy lifting. The error message you're getting is probably the result of a failed mapreduce job. Nine times out of ten, the problem is that the mappers/reducers are not granted enough memory for their YARN containers. On Tue, Feb 11, 2020, 10:41 AM Pau Tallada wrote: > Hi, > > Do you have more complete tracebacks? > > Missatge de Charles Givre del dia dt., 11 de febr. > 2020 a les 2:54: > >> Hello Everyone! >> I recently joined a project that has a Hive/Impala installation and we >> are experience a significant number of query failures. We are using an >> older version of Hive, and unfortunately there's nothing iI can do about >> that, but I'm wondering is how I can make Hive do better with queries to >> give our users a better experience. >> >> For example, I can execute a basic SELECT * query or SELECT >> query without issues. >> >> However, if I attempt to: >> 1. Add filters >> 2. Do a SELECT DISTINCT >> 3. Perform basic aggregation >> >> I get errors like this: Execution Error, return code 1 from >> org.apache.hadoop.hive.ql.exec.mr.MapRedTask. >> >> Could someone point me to some good guides for querying Hive and/or >> assisting my engineers in preventing these errors? >> Thanks, >> >> > > -- > -- > Pau Tallada Crespí > Dep. d'Astrofísica i Cosmologia > Port d'Informació Científica (PIC) > Tel: +34 93 170 2729 > -- > >
Running multiple HMS pointing to the same MySQL
Hi, We're thinking about running multiple instances of Hive Metastore Server (pointing to the same MySQL store) to tackle the HMS load issue we're experiencing. We're thinking about having READ only use cases contact one HMS and the mixed/heavier use cases contact the other HMS. In the past, we only use the second HMS as a backup to the primary HMS. Is there any problem, or gotchas, with having multiple HMS used as the primary servers at the same time? Thank you