Hi list, I'm trying to find a binary package (or ultimately to build Impala) to run in an AWS EMR 5.27 Environment for Hbase and Hive queries. AWS had, many moons ago, a binary build of Impala for their AWS EMR Hadoop but not anymore.
The AWS EMR environment (5.27) component versions are: - Hadoop: 2.8.5 - Hbase: 1.4.10 - Hive/HCatalog: 2.3.5 I've tried to install some versions of Cloudera binary packages from their repos but I ended up always having some sort of version problem when interacting with Hadoop/Hive. I've tried: - 2.12.0+cdh5.16.2+0-1.cdh5.16.2.p0.22.el6 @cloudera-cdh5 -- Catalogd fails to start due to NoClassDefFoundError: org/datanucleus/PersistenceNucleusContext (which i think is related to Hive being 2.x in AWS EMR) - 3.2.0+cdh6.3.0-1279813.el6 @cloudera-cdh6 -- Catalogd fails to start doe to NoClassDefFoundError: org/apache/commons/configuration/Configuration (which i think is related to a version mismatch from commons-configuration, which bumped on hadoop 3.x from commons-configuration 1.x to 2.x) I guess Cloudera 5.x has Hadoop 2.x and Hive 1.x, Cloudera 6.x has Hadoop 3.x and Hive 2.x.. where EMR sits in the middle with Hadoop 2.x and Hive 2.x and this will probably limit the possibility for me to run these builds. Now I'm trying to build Impala to try and get a build that could work in AWS EMR 5.27. I've built Impala from source correctly following the instructions from https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala with the default dependency versions. I'm now in the process of figuring out which versions I should configure in impala-config.sh so that the build will match the AWS EMR environment. Can anyone give me some pointers on this? PS: If you know of some binary packaged Impala version that could work with this AWS EMR Hadoop environment, let me know so I can test it. Thanks in advance Alex