Re: [How to :]Apache Impala and S3n File System
I think the Impala user mailing list may have better answer there. Forwarding your question to there. -BEGIN PGP MESSAGE- Comment: GPGTools - https://gpgtools.org hQIMAyfFM+08xiG6AQ/9GB/XiNSJSyxv/tWQhcHqUxLa7+gzg3kTvzAWhtkV9VAD bjwzhnP5BTnrdGFuJ8jrGSywy/wWOgq8TIalcSx/zzzaeQC3pQBzQIBy78gadlrq qsP5FAFsNI8LuGteDn6+2rO3WpkWYBv7SgeICCuuqdEJi6rnLKW2mqSh7fpv0XMu OPXjHM+4F9+kDZGyOx7F3aIdGJQBfBt68kNqK9Yrqhkxoy3P5ohohjE2XauqliCb 7zmrwpcuqSton9pOIjOIbxgJenLgrVvl250aYaGt1ScUEMXtnswcyeuZrLUzpmHa mG+a4ZKPkYTuwCq1bAl73IueMTjc/Ze/TPSPdW9s1g/Lveceggrxcdl9AVkdL9kT SBnd9BMTExy6stbgc6qbuxiyvEvhd66KeLL6EtXwOXfTgcHgNmkcJn5MkUx0wtE5 kRbrdexvdrjmFn8IDeFhQZqc3vYOvVo3qNu6B0zSYkZe/g1AAcs+u4KMTuvFlkqk cckySavOg/YEvPTyBWDzOGoyIjljL8A1C6KQARWzT8L7a+Yc1Js9H5/1Lq+cYWA4 X67YaT5x14BJnPkNzNWdhKixojv/MkWnocIbFZWOXsjpag8KobJ+EP277VdSX7AL 7bIaXDrl0nS8igRGxdvmfdly2awo3GB7VCPemZkgbmERamLBpBfVEUNEfBObR4nS 6QHJY/v/fQg0pNYkySmaAH+i7iTzn9gn+KMa5XcXnWQGUkok5W6xNQOmIe4F36PM 6eYrZuNmvQNOK1xVrLgQMRbUhrwpxyAKsvfmaVQXfTfPygLfpnJZZCi3wLVU13zP FIcDQgjcZMVT+AGbLQufIuC3I+sKQAp6xuSLDunlfwqkMSQGGDciptOEx35TeKMP oQQmEJ4OUQ7aoB5Ep7WGWUJ6SCBRtN4mtoJM1zT/tQIDnXia7Ri37J7vrudn6+3H NsmAg3rE9czeN/OpHIY1yiQjoqIblRTCiIwrESNLhSSywWHXv3AhNNIk6e/cmsL/ R2tR7PjX07mS+r3mRyZGzdJ2apI/y4Dkj2Om9d+8/2WU3nkmgbQ2vlvXaMKrVCQM PwxlZCn4hUgAr/GMi/b9x9Irp8SFhcOH8xqhiAIdBPaePJNIFfbwyT3dzGVgXbh0 xbV16Q9cUO2Fraaa2ik65It+QAiVo/Xz8oiyz9XflovCDtV3/+cZ/rX6t6/OCvgI 3Y83Mll+zGkc4JsVikQLIRFn6co5juOM1rqPY2U2DWv5q1e/UmN7XUYToZhoeUwe h3bdk5uPZuONPEd+3jFRfKt1NBHZWSciAHjkiki1Q9f7J1t6L5FiBFJtKKkUE5mb btHV8GXOIJXwr4FiKSSvuqtMjm0zm1VBj9K/wE4vP7Z1pUMN7vY0xmPYwtS/ffIp 2seMTOXYrxEj1soHdvd6Vu9DExsaX0rj1OaXoYcyuO8/E5rNFDAC8uOtxeABmhcq FdWEEFnko9l16wfyIbIdK8CCIVTMZUoKjkfow+5u9Ki0mAM9CdaiT7iYopTyzNd6 sStCy+pc3Ff8yCnoVMItkoCyolmYwJai+M9ReL82pO4N7Z1UteQLBKFSBPjwpYX2 morwibTVQw== =qf3b -END PGP MESSAGE- bin5JJCCLBvfB.bin Description: application/pgp-encrypted - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org
[How to :]Apache Impala and S3n File System
Hi, Has any body tried using the Apache Impala with S3n File system. Could share me the pros and cons of it. Appreciate the help. Thanks, Divya
Re: Impala
Thanks. I will do it! Juri -Original Message- From: Sean Busbey To: Nagalingam, Karthikeyan Cc: Kumar Jayapal ; user ; cdh-user Sent: Wed, Mar 9, 2016 12:53 pm Subject: Re: Impala You should join the mailing list for Apache Impala (incubating) and ask your question over there: http://mail-archives.apache.org/mod_mbox/incubator-impala-dev/ On Wed, Mar 9, 2016 at 8:12 AM, Nagalingam, Karthikeyan wrote: Hello, I am new to impala, my goal is to test join, aggregation against 2Million and 10Million records. can you please provide some documentation or website for starter ? Regards, Karthikeyan Nagalingam, Technical Marketing Engineer ( Big Data Analytics) Mobile: 919-376-6422 -- busbey
Re: Impala
You should join the mailing list for Apache Impala (incubating) and ask your question over there: http://mail-archives.apache.org/mod_mbox/incubator-impala-dev/ On Wed, Mar 9, 2016 at 8:12 AM, Nagalingam, Karthikeyan < karthikeyan.nagalin...@netapp.com> wrote: > Hello, > > > > I am new to impala, my goal is to test join, aggregation against 2Million > and 10Million records. can you please provide some documentation or website > for starter ? > > > > Regards, > > Karthikeyan Nagalingam, > > Technical Marketing Engineer ( Big Data Analytics) > > Mobile: 919-376-6422 > -- busbey
Impala
Hello, I am new to impala, my goal is to test join, aggregation against 2Million and 10Million records. can you please provide some documentation or website for starter ? Regards, Karthikeyan Nagalingam, Technical Marketing Engineer ( Big Data Analytics) Mobile: 919-376-6422
Unable to connect to Impala shell after updating the cluster to 5.5.1
Hi, Did anyone had this issue for impala ? I am Unable to connect to Impala shell after updating the cluster to 5.5.1 my cluster has kerberos and LDAP for authentication. When it try to connect impala shell It displays a message " LDAP credentials may not be sent over insecure connections. Enable SSL or set --auth_creds_ok_in_clear " impala-shell -i impala.jayapal.com -l -u jayapal LDAP credentials may not be sent over insecure connections. Enable SSL or set --auth_creds_ok_in_clear If I use impala-shell -i impala.jayapal.com -l -u jayapal --auth_creds_ok_in_clear it allow me to connect. Please let me know if anyone had resolved this issue. Thanks Jay
Impala returns error: "Bad status for request 5241: TGetOperationStatusResp"
Hello, I'm running CDH 5.2.1. When I tried to execute impala query on huge tables via HUE UI I got such error: Bad status for request 5241: TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=5, errorMessage=None, sqlState=None, errorCode=None) First, I've tried to execute the same query via impala-shell on one of my worker-nodes. I was confused that error message was: Query did not have enough memory to get the minimum required buffers. Backend 3:Memory Limit Exceeded I've checked impalad startup options. Command that gives me `ps aux | grep impalad` is: /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/impala/sbin-retail/impalad --flagfile=/var/run/cloudera-scm-agent/process/7492-impala-IMPALAD/impala-conf/impalad_flags And there is the content of the above flag_file: -beeswax_port=21000 -fe_port=21000 -be_port=22000 -llama_callback_port=28000 -hs2_port=21050 -enable_webserver=true -mem_limit=128849018880 !!! -webserver_port=25000 -max_result_cache_size=10 -state_store_subscriber_port=23000 -statestore_subscriber_timeout_seconds=30 -scratch_dirs=/disk1/impala/impalad,/disk10/impala/impalad,/disk2/impala/impalad,/disk3/impala/impalad,/disk4/impala/impalad,/disk5/impala/impalad,/disk6/impala/impalad,/disk7/impala/impalad,/disk8/impala/impalad,/disk9/impala/impalad,/opt/impala/impalad,/disk11/impala/impalad,/disk12/impala/impalad -default_query_options !!! -log_filename=impalad -hostname=my_hostname -state_store_host=my_host -local_nodemanager_url=my_host:8042 -llama_host=my_host -llama_port=15000 -enable_rm=true -pool_conf_file=pool-acls.txt -cgroup_hierarchy_path=/var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn -state_store_port=24000 -catalog_service_host=my_host -catalog_service_port=26000 -local_library_dir=/var/lib/impala/udfs -llama_max_request_attempts=5 -llama_registration_timeout_secs=30 -llama_registration_wait_secs=3 -fair_scheduler_allocation_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/fair-scheduler.xml -llama_site_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/llama-site.xml -disk_spill_encryption=false So, I've tried to change -default_query_options to: -default_query_options=mem_limit=128849018880 Now, if I do 'set;' in HUE or impala-shell I see: MEM_LIMIT: [128849018880] Before there was value "0". My queries were succesfully executed. Can you explain, why I have to set mem_limit under parameter "-default_query_options"? I thought that default memory limit for impalad is set by "mem_limit" option
Impala returns error: "Bad status for request 5241: TGetOperationStatusResp"
Hello, I'm running *CDH 5.2.1.* When I tried to execute impala query on huge tables via HUE UI I got such error: *Bad status for request 5241: TGetOperationStatusResp*(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=5, errorMessage=None, sqlState=None, errorCode=None) First, I've tried to execute the same query via *impala-shell* on one of my worker-nodes. I was confused that error message was: Query did not have enough memory to get the minimum required buffers. Backend 3:*Memory Limit Exceeded* I've checked impalad startup options. Command that gives me `ps aux | grep impalad` is: /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/impala/sbin-retail/impalad --flagfile=/var/run/cloudera-scm-agent/process/7492-impala-IMPALAD/impala-conf/impalad_flags And there is the content of the above flag_file: -beeswax_port=21000 -fe_port=21000 -be_port=22000 -llama_callback_port=28000 -hs2_port=21050 -enable_webserver=true *-mem_limit=128849018880* -webserver_port=25000 -max_result_cache_size=10 -state_store_subscriber_port=23000 -statestore_subscriber_timeout_seconds=30 -scratch_dirs=/disk1/impala/impalad,/disk10/impala/impalad,/disk2/impala/impalad,/disk3/impala/impalad,/disk4/impala/impalad,/disk5/impala/impalad,/disk6/impala/impalad,/disk7/impala/impalad,/disk8/impala/impalad,/disk9/impala/impalad,/opt/impala/impalad,/disk11/impala/impalad,/disk12/impala/impalad *-default_query_options* -log_filename=impalad -hostname=my_hostname -state_store_host=my_host -local_nodemanager_url=my_host:8042 -llama_host=my_host -llama_port=15000 -enable_rm=true -pool_conf_file=pool-acls.txt -cgroup_hierarchy_path=/var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn -state_store_port=24000 -catalog_service_host=my_host -catalog_service_port=26000 -local_library_dir=/var/lib/impala/udfs -llama_max_request_attempts=5 -llama_registration_timeout_secs=30 -llama_registration_wait_secs=3 -fair_scheduler_allocation_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/fair-scheduler.xml -llama_site_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/llama-site.xml -disk_spill_encryption=false So, I've tried to change *-default_query_options *to: *-default_query_options=mem_limit=**128849018880* Now, if I do 'set;' in HUE or impala-shell I see: * MEM_LIMIT: [128849018880]* Before there was value "0". My queries were succesfully executed. Can you explain, why I have to set mem_limit under parameter " *-default_query_options*"? I thought that default memory limit for impalad is set by "mem_limit" option -- Regards, Georgy
Impala returns error: "Bad status for request 5241: TGetOperationStatusResp"
Hello, I'm running *CDH 5.2.1.* When I tried to execute impala query on huge tables via HUE UI I got such error: *Bad status for request 5241: TGetOperationStatusResp*(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0), operationState=5, errorMessage=None, sqlState=None, errorCode=None) First, I've tried to execute the same query via *impala-shell* on one of my worker-nodes. I was confused that error message was: Query did not have enough memory to get the minimum required buffers. Backend 3:*Memory Limit Exceeded* I've checked impalad startup options. Command that gives me `ps aux | grep impalad` is: /opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/impala/sbin-retail/impalad --flagfile=/var/run/cloudera-scm-agent/process/7492-impala-IMPALAD/impala-conf/impalad_flags And there is the content of the above flag_file: -beeswax_port=21000 -fe_port=21000 -be_port=22000 -llama_callback_port=28000 -hs2_port=21050 -enable_webserver=true *-mem_limit=128849018880* -webserver_port=25000 -max_result_cache_size=10 -state_store_subscriber_port=23000 -statestore_subscriber_timeout_seconds=30 -scratch_dirs=/disk1/impala/impalad,/disk10/impala/impalad,/disk2/impala/impalad,/disk3/impala/impalad,/disk4/impala/impalad,/disk5/impala/impalad,/disk6/impala/impalad,/disk7/impala/impalad,/disk8/impala/impalad,/disk9/impala/impalad,/opt/impala/impalad,/disk11/impala/impalad,/disk12/impala/impalad *-default_query_options* -log_filename=impalad -hostname=my_hostname -state_store_host=my_host -local_nodemanager_url=my_host:8042 -llama_host=my_host -llama_port=15000 -enable_rm=true -pool_conf_file=pool-acls.txt -cgroup_hierarchy_path=/var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn -state_store_port=24000 -catalog_service_host=my_host -catalog_service_port=26000 -local_library_dir=/var/lib/impala/udfs -llama_max_request_attempts=5 -llama_registration_timeout_secs=30 -llama_registration_wait_secs=3 -fair_scheduler_allocation_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/fair-scheduler.xml -llama_site_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/llama-site.xml -disk_spill_encryption=false So, I've tried to change *-default_query_options *to: *-default_query_options=mem_limit=**128849018880* Now, if I do 'set;' in HUE or impala-shell I see: * MEM_LIMIT: [128849018880]* Before there was value "0". My queries were succesfully executed. Can you explain, why I have to set mem_limit under parameter " *-default_query_options*"? I thought that default memory limit for impalad is set by "mem_limit" option. -- Regards, Georgy
Re: Impala CREATE TABLE AS AVRO Requires "Redundant" Schema - Why?
Hi, Impala is a product of Cloudera. You might request help per: https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user <https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user> BR, Alex > On 26 Feb 2015, at 17:15, Vitale, Tom wrote: > > I used sqoop to import an MS SQL Server table into an Avro file on HDFS. No > problem. Then I tried to create an external Impala table using the following > DDL: > > CREATE EXTERNAL TABLE AvroTable > STORED AS AVRO > LOCATION '/tmp/AvroTable'; > > I got the error “ERROR: AnalysisException: Error loading Avro schema: No Avro > schema provided in SERDEPROPERTIES or TBLPROPERTIES for table: > default.AvroTable” > > So I extracted the schema from the Avro file using the avro-tools-1.7.4.jar > (-getschema) into a JSON file, then per the recommendation above, changed the > DDL to point to it: > > CREATE EXTERNAL TABLE AvroTable > STORED AS AVRO > LOCATION '/tmp/AvroTable' > TBLPROPERTIES( > 'serialization.format'='1', > > 'avro.schema.url'='hdfs://...net/tmp/AvroTable.schema' > > ); > > This worked fine. But my question is, why do you have to do this? The > schema is already in the Avro file – that’s where I got the JSON schema file > that I point to in the TBLPROPERTIES parameter! > > Thanks, Tom > > Tom Vitale > CREDIT SUISSE > Information Technology | Infra Arch & Strategy NY, KIVP > Eleven Madison Avenue | 10010-3629 New York | United States > Phone +1 212 538 0708 > thomas.vit...@credit-suisse.com <mailto:thomas.vit...@credit-suisse.com> | > www.credit-suisse.com <http://www.credit-suisse.com/> > > > > > == > Please access the attached hyperlink for an important electronic > communications disclaimer: > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html > <http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html> > ==
Impala CREATE TABLE AS AVRO Requires "Redundant" Schema - Why?
I used sqoop to import an MS SQL Server table into an Avro file on HDFS. No problem. Then I tried to create an external Impala table using the following DDL: CREATE EXTERNAL TABLE AvroTable STORED AS AVRO LOCATION '/tmp/AvroTable'; I got the error "ERROR: AnalysisException: Error loading Avro schema: No Avro schema provided in SERDEPROPERTIES or TBLPROPERTIES for table: default.AvroTable" So I extracted the schema from the Avro file using the avro-tools-1.7.4.jar (-getschema) into a JSON file, then per the recommendation above, changed the DDL to point to it: CREATE EXTERNAL TABLE AvroTable STORED AS AVRO LOCATION '/tmp/AvroTable' TBLPROPERTIES( 'serialization.format'='1', 'avro.schema.url'='hdfs://...net/tmp/AvroTable.schema' ); This worked fine. But my question is, why do you have to do this? The schema is already in the Avro file - that's where I got the JSON schema file that I point to in the TBLPROPERTIES parameter! Thanks, Tom Tom Vitale CREDIT SUISSE Information Technology | Infra Arch & Strategy NY, KIVP Eleven Madison Avenue | 10010-3629 New York | United States Phone +1 212 538 0708 thomas.vit...@credit-suisse.com<mailto:thomas.vit...@credit-suisse.com> | www.credit-suisse.com<http://www.credit-suisse.com> === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===
Fwd: External Table creation in hive fails on impala integration with hive
-- Forwarded message -- From: Sathish Kumar Date: Wed, Oct 23, 2013 at 10:28 AM Subject: Re: External Table creation in hive fails on impala integration with hive To: cdh-u...@cloudera.org Hi All, Thanks Saro, It worked. I have small doubt if my my Row Key and Value is as below what will be the data type we suppose to use next to TABLE *(create external TABLE hbase_table(key int, value string) * ROW COLUMN+CELL \x00\x00\x01As\xBDJ column=d:a, timestamp=1380629572482, value=\x1F\x8B\x08\x08cn Regards Sathish* * On Wed, Oct 23, 2013 at 1:26 AM, Saro saravanan wrote: > hi > > set hbase.zookeeper.quorum=localhost; > > create external TABLE hbase_table(key int, value string) STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH > SERDEPROPERTIES("hbase.columns.mapping" = ":key,datahive1:name") > TBLPROPERTIES("hbase.table.name" = "tablename"); > > > > On Wed, Oct 23, 2013 at 8:22 AM, Sathish Kumar wrote: > >> >> >> -- Forwarded message -- >> From: Sathish Kumar >> Date: Tue, Oct 22, 2013 at 4:59 PM >> Subject: External Table creation in hive fails on impala integration with >> hive >> To: cdh-u...@cloudera.org >> >> >> Hi All, >> >> I am trying to integrate impala with hbase, Received a syntax error as >> mention below. >> >> ERROR: AnalysisException: Syntax error at: >> create EXTERNAL TABLE hbase_table_2(key int, value int, value2 string) >> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH >> SERDEPROPERTIES ("hbase.columns.mapping" = "d:val") TBLPROPERTIES(" >> hbase.table.name" = "xyz") >> >> >> In the above command, I suspect the column name "val" is wrong, by >> giving the "describe tablename" command I am able find the column family >> name but not sure about how to find the column name. >> >> Please help me if you find any thing wrong in my command. >> >> Regards >> Sathish >> >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "CDH Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to cdh-user+unsubscr...@cloudera.org. >> For more options, visit >> https://groups.google.com/a/cloudera.org/groups/opt_out. >> > > > > -- > Thanks > *saravanan* > *9095260692* > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "CDH Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to cdh-user+unsubscr...@cloudera.org. > For more options, visit > https://groups.google.com/a/cloudera.org/groups/opt_out. >
Fwd: External Table creation in hive fails on impala integration with hive
-- Forwarded message -- From: Sathish Kumar Date: Tue, Oct 22, 2013 at 4:59 PM Subject: External Table creation in hive fails on impala integration with hive To: cdh-u...@cloudera.org Hi All, I am trying to integrate impala with hbase, Received a syntax error as mention below. ERROR: AnalysisException: Syntax error at: create EXTERNAL TABLE hbase_table_2(key int, value int, value2 string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "d:val") TBLPROPERTIES(" hbase.table.name" = "xyz") In the above command, I suspect the column name "val" is wrong, by giving the "describe tablename" command I am able find the column family name but not sure about how to find the column name. Please help me if you find any thing wrong in my command. Regards Sathish
Re: impala
read http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/ On Mon, Aug 26, 2013 at 3:01 PM, Ram wrote: > > Hi, > Any one can suggest the following. > > how exactly Impala works? What happens when you submit a query? How the > data will be transferred to different nodes? > > > From, > Ramesh. > > > -- Nitin Pawar
impala
Hi, Any one can suggest the following. how exactly Impala works? What happens when you submit a query? How the data will be transferred to different nodes? From, Ramesh.
Need help with cluster setup for performance [Impala]
My apologies for sending this message to this group, but I'm having trouble sending to the right group. From: Steven Wong Sent: Wednesday, January 23, 2013 11:15 AM To: impala-u...@cloudera.org Subject: RE: Need help with cluster setup for performance Thanks for the suggestions. The /metrics output looks good now, and the SELECT COUNT(*) runs much faster than before. But I still have the "Unknown disk id" error message. My CDH version is: hadoop-clientx86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4 18 k hadoop-mapreduce x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4 9.8 M hadoop-yarn x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4 8.9 M On Tuesday, January 22, 2013 5:37:30 PM UTC-8, Henry wrote: On 22 January 2013 11:40, Steven Wong wrote: Hi, I followed http://zenfractal.com/2012/11/15/from-zero-to-impala-in-minutes/ to set up a cluster on EC2. After seeing disappointing performance numbers from a SELECT COUNT(*), I am following https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance#ConfiguringImpalaforPerformance-TestingImpalaforHighPerformanceConfiguration to check my cluster setup. Questions: 1. My cluster has 3 data nodes. Is the following http://:/metrics output good? statestore.backend.state.map: { 127.0.0.1:23000<http://127.0.0.1:23000/> : OK } statestore.live.backends:3 statestore.live.backends.list:[127.0.0.1:22000<http://127.0.0.1:22000/>] Hi Steven - This looks like your problem. Your machines are registering themselves with 'localhost' as their hostname, and this means that they all look the same to the statestore. I looked at Matt's zero-to-impala link - it's awesome, but now a little out of date. You should modify where you run impalad to also have --ipaddress and --hostname correctly set for each node. Then check the statestore metrics; things should look a lot better and your performance should improve. 2. My impalad logs contain "Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata." and my http://:/varz doesn't contain the string "dfs.datanode.hdfs-blocks-metadata.enabled". But my hdfs-site.xml sets dfs.datanode.hdfs-blocks-metadata.enabled to true. Why? What version of CDH are you using? 3. My impalad.out doesn't contain "Unable to load native-hadoop library". This is good, I believe. 4. My impalad logs contain the following lines matching the word "scheduler", but none contains "locality percentage". Why? The locality percentage is printed only for GLOG_v=1 - and I note that the setup-impala.sh script has a typo where it has GVLOG_v=1. If you fix this, you should see the locality percentage. Hope this helps - let us know if things improve. Henry /tmp/impalad.INFO:I0122 00:19:09.137197 5121 simple-scheduler.cc:82] Starting simple scheduler /tmp/impalad.ip-10-170-17-154.impala.log.INFO.20130122-001901.5121:I0122 00:19:09.137197 5121 simple-scheduler.cc:82] Starting simple scheduler Thanks. Steven -- -- Henry Robinson Software Engineer Cloudera 415-994-6679