Hi list,

I'm trying to find a binary package (or ultimately to build Impala) to run
in an AWS EMR 5.27 Environment for Hbase and Hive queries. AWS had, many
moons ago, a binary build of Impala for their AWS EMR Hadoop but not
anymore.

The AWS EMR environment (5.27) component versions are:
- Hadoop: 2.8.5
- Hbase: 1.4.10
- Hive/HCatalog: 2.3.5

I've tried to install some versions of Cloudera binary packages from their
repos but I ended up always having some sort of version problem when
interacting with Hadoop/Hive.

I've tried:
- 2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22.el6 @cloudera-cdh5
-- Catalogd fails to start due to NoClassDefFoundError:
org/datanucleus/PersistenceNucleusContext (which i think is related to Hive
being 2.x in AWS EMR)

- 3.2.0+cdh6.3.0-1279813.el6 @cloudera-cdh6
-- Catalogd fails to start doe to NoClassDefFoundError:
org/apache/commons/configuration/Configuration (which i think is related to
a version mismatch from commons-configuration, which bumped on hadoop 3.x
from commons-configuration 1.x to 2.x)

I guess Cloudera 5.x has Hadoop 2.x and Hive 1.x, Cloudera 6.x has Hadoop
3.x and Hive 2.x.. where EMR sits in the middle with Hadoop 2.x and Hive
2.x and this will probably limit the possibility for me to run these builds.

Now I'm trying to build Impala to try and get a build that could work in
AWS EMR 5.27.

I've built Impala from source correctly following the instructions from
https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala with the
default dependency versions.

I'm now in the process of figuring out which versions I should configure in
impala-config.sh so that the build will match the AWS EMR environment.

Can anyone give me some pointers on this?

PS: If you know of some binary packaged Impala version that could work with
this AWS EMR Hadoop environment, let me know so I can test it.

Thanks in advance
Alex

Reply via email to