Re: [How to :]Apache Impala and S3n File System

2016-09-19 Thread Wei-Chiu Chuang
I think the Impala user mailing list may have better answer there.
Forwarding your question to there.

-BEGIN PGP MESSAGE-
Comment: GPGTools - https://gpgtools.org

hQIMAyfFM+08xiG6AQ/9GB/XiNSJSyxv/tWQhcHqUxLa7+gzg3kTvzAWhtkV9VAD
bjwzhnP5BTnrdGFuJ8jrGSywy/wWOgq8TIalcSx/zzzaeQC3pQBzQIBy78gadlrq
qsP5FAFsNI8LuGteDn6+2rO3WpkWYBv7SgeICCuuqdEJi6rnLKW2mqSh7fpv0XMu
OPXjHM+4F9+kDZGyOx7F3aIdGJQBfBt68kNqK9Yrqhkxoy3P5ohohjE2XauqliCb
7zmrwpcuqSton9pOIjOIbxgJenLgrVvl250aYaGt1ScUEMXtnswcyeuZrLUzpmHa
mG+a4ZKPkYTuwCq1bAl73IueMTjc/Ze/TPSPdW9s1g/Lveceggrxcdl9AVkdL9kT
SBnd9BMTExy6stbgc6qbuxiyvEvhd66KeLL6EtXwOXfTgcHgNmkcJn5MkUx0wtE5
kRbrdexvdrjmFn8IDeFhQZqc3vYOvVo3qNu6B0zSYkZe/g1AAcs+u4KMTuvFlkqk
cckySavOg/YEvPTyBWDzOGoyIjljL8A1C6KQARWzT8L7a+Yc1Js9H5/1Lq+cYWA4
X67YaT5x14BJnPkNzNWdhKixojv/MkWnocIbFZWOXsjpag8KobJ+EP277VdSX7AL
7bIaXDrl0nS8igRGxdvmfdly2awo3GB7VCPemZkgbmERamLBpBfVEUNEfBObR4nS
6QHJY/v/fQg0pNYkySmaAH+i7iTzn9gn+KMa5XcXnWQGUkok5W6xNQOmIe4F36PM
6eYrZuNmvQNOK1xVrLgQMRbUhrwpxyAKsvfmaVQXfTfPygLfpnJZZCi3wLVU13zP
FIcDQgjcZMVT+AGbLQufIuC3I+sKQAp6xuSLDunlfwqkMSQGGDciptOEx35TeKMP
oQQmEJ4OUQ7aoB5Ep7WGWUJ6SCBRtN4mtoJM1zT/tQIDnXia7Ri37J7vrudn6+3H
NsmAg3rE9czeN/OpHIY1yiQjoqIblRTCiIwrESNLhSSywWHXv3AhNNIk6e/cmsL/
R2tR7PjX07mS+r3mRyZGzdJ2apI/y4Dkj2Om9d+8/2WU3nkmgbQ2vlvXaMKrVCQM
PwxlZCn4hUgAr/GMi/b9x9Irp8SFhcOH8xqhiAIdBPaePJNIFfbwyT3dzGVgXbh0
xbV16Q9cUO2Fraaa2ik65It+QAiVo/Xz8oiyz9XflovCDtV3/+cZ/rX6t6/OCvgI
3Y83Mll+zGkc4JsVikQLIRFn6co5juOM1rqPY2U2DWv5q1e/UmN7XUYToZhoeUwe
h3bdk5uPZuONPEd+3jFRfKt1NBHZWSciAHjkiki1Q9f7J1t6L5FiBFJtKKkUE5mb
btHV8GXOIJXwr4FiKSSvuqtMjm0zm1VBj9K/wE4vP7Z1pUMN7vY0xmPYwtS/ffIp
2seMTOXYrxEj1soHdvd6Vu9DExsaX0rj1OaXoYcyuO8/E5rNFDAC8uOtxeABmhcq
FdWEEFnko9l16wfyIbIdK8CCIVTMZUoKjkfow+5u9Ki0mAM9CdaiT7iYopTyzNd6
sStCy+pc3Ff8yCnoVMItkoCyolmYwJai+M9ReL82pO4N7Z1UteQLBKFSBPjwpYX2
morwibTVQw==
=qf3b
-END PGP MESSAGE-


bin5JJCCLBvfB.bin
Description: application/pgp-encrypted

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

[How to :]Apache Impala and S3n File System

2016-09-19 Thread Divya Gehlot
Hi,

Has any body tried using the Apache Impala with S3n File system.
Could share me the pros and cons of it.
Appreciate the help.


Thanks,
Divya


Re: Impala

2016-03-09 Thread Juri Yanase Triantaphyllou
Thanks. I will do it!


Juri



-Original Message-
From: Sean Busbey 
To: Nagalingam, Karthikeyan 
Cc: Kumar Jayapal ; user ; 
cdh-user 
Sent: Wed, Mar 9, 2016 12:53 pm
Subject: Re: Impala



You should join the mailing list for Apache Impala (incubating) and ask your 
question over there:


http://mail-archives.apache.org/mod_mbox/incubator-impala-dev/




On Wed, Mar 9, 2016 at 8:12 AM, Nagalingam, Karthikeyan 
 wrote:


Hello,
 
I am new to impala, my goal is to test join, aggregation against 2Million and 
10Million records. can you please provide some documentation or website for 
starter ?
 
Regards,
Karthikeyan Nagalingam,
Technical Marketing Engineer ( Big Data Analytics)
Mobile: 919-376-6422







-- 


busbey






Re: Impala

2016-03-09 Thread Sean Busbey
You should join the mailing list for Apache Impala (incubating) and ask
your question over there:

http://mail-archives.apache.org/mod_mbox/incubator-impala-dev/

On Wed, Mar 9, 2016 at 8:12 AM, Nagalingam, Karthikeyan <
karthikeyan.nagalin...@netapp.com> wrote:

> Hello,
>
>
>
> I am new to impala, my goal is to test join, aggregation against 2Million
> and 10Million records. can you please provide some documentation or website
> for starter ?
>
>
>
> Regards,
>
> Karthikeyan Nagalingam,
>
> Technical Marketing Engineer ( Big Data Analytics)
>
> Mobile: 919-376-6422
>



-- 
busbey


Impala

2016-03-09 Thread Nagalingam, Karthikeyan
Hello,

I am new to impala, my goal is to test join, aggregation against 2Million and 
10Million records. can you please provide some documentation or website for 
starter ?

Regards,
Karthikeyan Nagalingam,
Technical Marketing Engineer ( Big Data Analytics)
Mobile: 919-376-6422


Unable to connect to Impala shell after updating the cluster to 5.5.1

2016-01-25 Thread Kumar Jayapal
Hi,
Did anyone had this issue for impala ?

I am Unable to connect to Impala shell after updating the cluster to 5.5.1

my cluster has kerberos and LDAP for authentication. When it try to connect
impala shell It displays a message

" LDAP credentials may not be sent over insecure connections. Enable SSL or
set --auth_creds_ok_in_clear "

impala-shell -i impala.jayapal.com -l -u jayapal

LDAP credentials may not be sent over insecure connections. Enable SSL or
set --auth_creds_ok_in_clear

If I use impala-shell -i impala.jayapal.com -l -u jayapal
--auth_creds_ok_in_clear

it allow me to connect.

Please let me know if anyone had resolved this issue.






Thanks
Jay


Impala returns error: "Bad status for request 5241: TGetOperationStatusResp"

2015-05-15 Thread g10ck
Hello,


I'm running CDH 5.2.1.

When I tried to execute impala query on huge tables via HUE UI I got such error:
Bad status for request 5241: 
TGetOperationStatusResp(status=TStatus(errorCode=None, errorMessage=None, 
sqlState=None, infoMessages=None, statusCode=0), operationState=5, 
errorMessage=None, sqlState=None, errorCode=None)

First, I've tried to execute the same query via impala-shell on one of my 
worker-nodes. I was confused that error message was:
Query did not have enough memory to get the minimum required buffers.
Backend 3:Memory Limit Exceeded

I've checked impalad startup options. Command that gives me `ps aux | grep 
impalad` is: 
/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/impala/sbin-retail/impalad 
--flagfile=/var/run/cloudera-scm-agent/process/7492-impala-IMPALAD/impala-conf/impalad_flags

And there is the content of the above flag_file:
-beeswax_port=21000
-fe_port=21000
-be_port=22000
-llama_callback_port=28000
-hs2_port=21050
-enable_webserver=true
-mem_limit=128849018880 !!!
-webserver_port=25000
-max_result_cache_size=10
-state_store_subscriber_port=23000
-statestore_subscriber_timeout_seconds=30
-scratch_dirs=/disk1/impala/impalad,/disk10/impala/impalad,/disk2/impala/impalad,/disk3/impala/impalad,/disk4/impala/impalad,/disk5/impala/impalad,/disk6/impala/impalad,/disk7/impala/impalad,/disk8/impala/impalad,/disk9/impala/impalad,/opt/impala/impalad,/disk11/impala/impalad,/disk12/impala/impalad
-default_query_options !!!
-log_filename=impalad
-hostname=my_hostname
-state_store_host=my_host
-local_nodemanager_url=my_host:8042
-llama_host=my_host
-llama_port=15000
-enable_rm=true
-pool_conf_file=pool-acls.txt
-cgroup_hierarchy_path=/var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn
-state_store_port=24000
-catalog_service_host=my_host
-catalog_service_port=26000
-local_library_dir=/var/lib/impala/udfs
-llama_max_request_attempts=5
-llama_registration_timeout_secs=30
-llama_registration_wait_secs=3
-fair_scheduler_allocation_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/fair-scheduler.xml
-llama_site_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/llama-site.xml
-disk_spill_encryption=false

So, I've tried to change -default_query_options to: 
-default_query_options=mem_limit=128849018880

Now, if I do 'set;' in HUE or impala-shell I see:
 MEM_LIMIT: [128849018880]
Before there was value "0". My queries were succesfully executed.

Can you explain, why I have to set mem_limit under parameter 
"-default_query_options"? I thought that default memory limit for impalad is 
set by "mem_limit" option


Impala returns error: "Bad status for request 5241: TGetOperationStatusResp"

2015-05-15 Thread Georgy
Hello,

I'm running *CDH 5.2.1.*

When I tried to execute impala query on huge tables via HUE UI I got such
error:
*Bad status for request 5241:
TGetOperationStatusResp*(status=TStatus(errorCode=None,
errorMessage=None, sqlState=None, infoMessages=None, statusCode=0),
operationState=5, errorMessage=None, sqlState=None, errorCode=None)

First, I've tried to execute the same query via *impala-shell* on one of my
worker-nodes. I was confused that error message was:
Query did not have enough memory to get the minimum required buffers.
Backend 3:*Memory Limit Exceeded*

I've checked impalad startup options. Command that gives me `ps aux | grep
impalad` is:
/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/impala/sbin-retail/impalad
--flagfile=/var/run/cloudera-scm-agent/process/7492-impala-IMPALAD/impala-conf/impalad_flags

And there is the content of the above flag_file:
-beeswax_port=21000
-fe_port=21000
-be_port=22000
-llama_callback_port=28000
-hs2_port=21050
-enable_webserver=true
*-mem_limit=128849018880*
-webserver_port=25000
-max_result_cache_size=10
-state_store_subscriber_port=23000
-statestore_subscriber_timeout_seconds=30
-scratch_dirs=/disk1/impala/impalad,/disk10/impala/impalad,/disk2/impala/impalad,/disk3/impala/impalad,/disk4/impala/impalad,/disk5/impala/impalad,/disk6/impala/impalad,/disk7/impala/impalad,/disk8/impala/impalad,/disk9/impala/impalad,/opt/impala/impalad,/disk11/impala/impalad,/disk12/impala/impalad
*-default_query_options*
-log_filename=impalad
-hostname=my_hostname
-state_store_host=my_host
-local_nodemanager_url=my_host:8042
-llama_host=my_host
-llama_port=15000
-enable_rm=true
-pool_conf_file=pool-acls.txt
-cgroup_hierarchy_path=/var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn
-state_store_port=24000
-catalog_service_host=my_host
-catalog_service_port=26000
-local_library_dir=/var/lib/impala/udfs
-llama_max_request_attempts=5
-llama_registration_timeout_secs=30
-llama_registration_wait_secs=3
-fair_scheduler_allocation_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/fair-scheduler.xml
-llama_site_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/llama-site.xml
-disk_spill_encryption=false

So, I've tried to change *-default_query_options *to:
*-default_query_options=mem_limit=**128849018880*

Now, if I do 'set;' in HUE or impala-shell I see:
* MEM_LIMIT: [128849018880]*
Before there was value "0". My queries were succesfully executed.

Can you explain, why I have to set mem_limit under parameter "
*-default_query_options*"? I thought that default memory limit for impalad
is set by "mem_limit" option

-- 
Regards,
Georgy


Impala returns error: "Bad status for request 5241: TGetOperationStatusResp"

2015-05-14 Thread Георгий Безруких
Hello,

I'm running *CDH 5.2.1.*

When I tried to execute impala query on huge tables via HUE UI I got such
error:
*Bad status for request 5241:
TGetOperationStatusResp*(status=TStatus(errorCode=None,
errorMessage=None, sqlState=None, infoMessages=None, statusCode=0),
operationState=5, errorMessage=None, sqlState=None, errorCode=None)

First, I've tried to execute the same query via *impala-shell* on one of my
worker-nodes. I was confused that error message was:
Query did not have enough memory to get the minimum required buffers.
Backend 3:*Memory Limit Exceeded*

I've checked impalad startup options. Command that gives me `ps aux | grep
impalad` is:
/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/impala/sbin-retail/impalad
--flagfile=/var/run/cloudera-scm-agent/process/7492-impala-IMPALAD/impala-conf/impalad_flags

And there is the content of the above flag_file:
-beeswax_port=21000
-fe_port=21000
-be_port=22000
-llama_callback_port=28000
-hs2_port=21050
-enable_webserver=true
*-mem_limit=128849018880*
-webserver_port=25000
-max_result_cache_size=10
-state_store_subscriber_port=23000
-statestore_subscriber_timeout_seconds=30
-scratch_dirs=/disk1/impala/impalad,/disk10/impala/impalad,/disk2/impala/impalad,/disk3/impala/impalad,/disk4/impala/impalad,/disk5/impala/impalad,/disk6/impala/impalad,/disk7/impala/impalad,/disk8/impala/impalad,/disk9/impala/impalad,/opt/impala/impalad,/disk11/impala/impalad,/disk12/impala/impalad
*-default_query_options*
-log_filename=impalad
-hostname=my_hostname
-state_store_host=my_host
-local_nodemanager_url=my_host:8042
-llama_host=my_host
-llama_port=15000
-enable_rm=true
-pool_conf_file=pool-acls.txt
-cgroup_hierarchy_path=/var/run/cloudera-scm-agent/cgroups/cpu/hadoop-yarn
-state_store_port=24000
-catalog_service_host=my_host
-catalog_service_port=26000
-local_library_dir=/var/lib/impala/udfs
-llama_max_request_attempts=5
-llama_registration_timeout_secs=30
-llama_registration_wait_secs=3
-fair_scheduler_allocation_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/fair-scheduler.xml
-llama_site_path=/var/run/cloudera-scm-agent/process/7494-impala-IMPALAD/impala-conf/llama-site.xml
-disk_spill_encryption=false

So, I've tried to change *-default_query_options *to:
*-default_query_options=mem_limit=**128849018880*

Now, if I do 'set;' in HUE or impala-shell I see:
* MEM_LIMIT: [128849018880]*
Before there was value "0". My queries were succesfully executed.

Can you explain, why I have to set mem_limit under parameter "
*-default_query_options*"? I thought that default memory limit for impalad
is set by "mem_limit" option.

-- 
Regards,
Georgy


Re: Impala CREATE TABLE AS AVRO Requires "Redundant" Schema - Why?

2015-02-26 Thread Alexander Alten-Lorenz
Hi,

Impala is a product of Cloudera. You might request help per:
https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user 
<https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user>

BR, 
 Alex


> On 26 Feb 2015, at 17:15, Vitale, Tom  wrote:
> 
> I used sqoop to import an MS SQL Server table into an Avro file on HDFS.  No 
> problem. Then I tried to create an external Impala table using the following 
> DDL:
>  
> CREATE EXTERNAL TABLE AvroTable
> STORED AS AVRO
> LOCATION '/tmp/AvroTable';
>  
> I got the error “ERROR: AnalysisException: Error loading Avro schema: No Avro 
> schema provided in SERDEPROPERTIES or TBLPROPERTIES for table: 
> default.AvroTable”
>  
> So I extracted the schema from the Avro file using the avro-tools-1.7.4.jar 
> (-getschema) into a JSON file, then per the recommendation above, changed the 
> DDL to point to it:
>  
> CREATE EXTERNAL TABLE AvroTable
> STORED AS AVRO
> LOCATION '/tmp/AvroTable'
> TBLPROPERTIES(
> 'serialization.format'='1',
> 
> 'avro.schema.url'='hdfs://...net/tmp/AvroTable.schema'
>  
> );
>  
> This worked fine.  But my question is, why do you have to do this?  The 
> schema is already in the Avro file – that’s where I got the JSON schema file 
> that I point to in the TBLPROPERTIES parameter!
>  
> Thanks, Tom
>  
> Tom Vitale
> CREDIT SUISSE
> Information Technology | Infra Arch & Strategy NY, KIVP
> Eleven Madison Avenue | 10010-3629 New York | United States
> Phone +1 212 538 0708
> thomas.vit...@credit-suisse.com <mailto:thomas.vit...@credit-suisse.com> | 
> www.credit-suisse.com <http://www.credit-suisse.com/>
>  
> 
> 
> 
> ==
> Please access the attached hyperlink for an important electronic 
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
> <http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html>
> ==



Impala CREATE TABLE AS AVRO Requires "Redundant" Schema - Why?

2015-02-26 Thread Vitale, Tom
I used sqoop to import an MS SQL Server table into an Avro file on HDFS.  No 
problem. Then I tried to create an external Impala table using the following 
DDL:

CREATE EXTERNAL TABLE AvroTable
STORED AS AVRO
LOCATION '/tmp/AvroTable';

I got the error "ERROR: AnalysisException: Error loading Avro schema: No Avro 
schema provided in SERDEPROPERTIES or TBLPROPERTIES for table: 
default.AvroTable"

So I extracted the schema from the Avro file using the avro-tools-1.7.4.jar 
(-getschema) into a JSON file, then per the recommendation above, changed the 
DDL to point to it:

CREATE EXTERNAL TABLE AvroTable
STORED AS AVRO
LOCATION '/tmp/AvroTable'
TBLPROPERTIES(
'serialization.format'='1',

'avro.schema.url'='hdfs://...net/tmp/AvroTable.schema'
);

This worked fine.  But my question is, why do you have to do this?  The schema 
is already in the Avro file - that's where I got the JSON schema file that I 
point to in the TBLPROPERTIES parameter!

Thanks, Tom

Tom Vitale
CREDIT SUISSE
Information Technology | Infra Arch & Strategy NY, KIVP
Eleven Madison Avenue | 10010-3629 New York | United States
Phone +1 212 538 0708
thomas.vit...@credit-suisse.com<mailto:thomas.vit...@credit-suisse.com> | 
www.credit-suisse.com<http://www.credit-suisse.com>




=== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=== 


Fwd: External Table creation in hive fails on impala integration with hive

2013-10-23 Thread Sathish Kumar
-- Forwarded message --
From: Sathish Kumar 
Date: Wed, Oct 23, 2013 at 10:28 AM
Subject: Re: External Table creation in hive fails on impala integration
with hive
To: cdh-u...@cloudera.org


Hi All,

Thanks Saro, It worked.

I have small doubt if my my Row Key and Value is as below what  will be the
data type we suppose to use next to TABLE *(create external TABLE
hbase_table(key int, value string)
*
ROW
COLUMN+CELL

 \x00\x00\x01As\xBDJ column=d:a, timestamp=1380629572482,
value=\x1F\x8B\x08\x08cn

Regards
Sathish*
*


On Wed, Oct 23, 2013 at 1:26 AM, Saro saravanan wrote:

> hi
>
> set hbase.zookeeper.quorum=localhost;
>
> create external TABLE hbase_table(key int, value string) STORED BY
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
> SERDEPROPERTIES("hbase.columns.mapping" = ":key,datahive1:name")
> TBLPROPERTIES("hbase.table.name" = "tablename");
>
>
>
> On Wed, Oct 23, 2013 at 8:22 AM, Sathish Kumar  wrote:
>
>>
>>
>> -- Forwarded message --
>> From: Sathish Kumar 
>> Date: Tue, Oct 22, 2013 at 4:59 PM
>> Subject: External Table creation in hive fails on impala integration with
>> hive
>> To: cdh-u...@cloudera.org
>>
>>
>>  Hi All,
>>
>> I am trying to integrate impala with hbase, Received a syntax error as
>> mention below.
>>
>> ERROR: AnalysisException: Syntax error at:
>> create EXTERNAL TABLE hbase_table_2(key int, value int, value2 string)
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
>> SERDEPROPERTIES ("hbase.columns.mapping" = "d:val") TBLPROPERTIES("
>> hbase.table.name" = "xyz")
>>
>>
>> In the above command, I suspect the column name "val"  is wrong, by
>> giving the "describe tablename" command I am able find the column family
>> name but not sure about how to find the column name.
>>
>> Please help me if you find any thing wrong in my command.
>>
>> Regards
>> Sathish
>>
>>
>>  --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "CDH Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to cdh-user+unsubscr...@cloudera.org.
>> For more options, visit
>> https://groups.google.com/a/cloudera.org/groups/opt_out.
>>
>
>
>
> --
> Thanks
>  *saravanan*
> *9095260692*
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "CDH Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cdh-user+unsubscr...@cloudera.org.
> For more options, visit
> https://groups.google.com/a/cloudera.org/groups/opt_out.
>


Fwd: External Table creation in hive fails on impala integration with hive

2013-10-22 Thread Sathish Kumar
-- Forwarded message --
From: Sathish Kumar 
Date: Tue, Oct 22, 2013 at 4:59 PM
Subject: External Table creation in hive fails on impala integration with
hive
To: cdh-u...@cloudera.org


Hi All,

I am trying to integrate impala with hbase, Received a syntax error as
mention below.

ERROR: AnalysisException: Syntax error at:
create EXTERNAL TABLE hbase_table_2(key int, value int, value2 string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
SERDEPROPERTIES ("hbase.columns.mapping" = "d:val") TBLPROPERTIES("
hbase.table.name" = "xyz")


In the above command, I suspect the column name "val"  is wrong, by giving
the "describe tablename" command I am able find the column family name but
not sure about how to find the column name.

Please help me if you find any thing wrong in my command.

Regards
Sathish


Re: impala

2013-08-26 Thread Nitin Pawar
read
http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/


On Mon, Aug 26, 2013 at 3:01 PM, Ram  wrote:

>
> Hi,
> Any one can suggest the following.
>
> how exactly Impala works?  What happens when you submit a query?  How the
> data will be transferred to different nodes?
>
>
> From,
> Ramesh.
>
>
>


-- 
Nitin Pawar


impala

2013-08-26 Thread Ram
Hi,
Any one can suggest the following.

how exactly Impala works?  What happens when you submit a query?  How the
data will be transferred to different nodes?


From,
Ramesh.


Need help with cluster setup for performance [Impala]

2013-01-23 Thread Steven Wong
My apologies for sending this message to this group, but I'm having trouble 
sending to the right group.



From: Steven Wong
Sent: Wednesday, January 23, 2013 11:15 AM
To: impala-u...@cloudera.org
Subject: RE: Need help with cluster setup for performance

Thanks for the suggestions. The /metrics output looks good now, and the SELECT 
COUNT(*) runs much faster than before.

But I still have the "Unknown disk id" error message. My CDH version is:

 hadoop-clientx86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4  18 k
 hadoop-mapreduce x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4 9.8 M
 hadoop-yarn  x86_64 2.0.0+552-1.cdh4.1.2.p0.27.el5 cloudera-cdh4 8.9 M



On Tuesday, January 22, 2013 5:37:30 PM UTC-8, Henry wrote:
On 22 January 2013 11:40, Steven Wong  wrote:
Hi,

I followed http://zenfractal.com/2012/11/15/from-zero-to-impala-in-minutes/ to 
set up a cluster on EC2. After seeing disappointing performance numbers from a 
SELECT COUNT(*), I am following 
https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance#ConfiguringImpalaforPerformance-TestingImpalaforHighPerformanceConfiguration
 to check my cluster setup. Questions:

1. My cluster has 3 data nodes. Is the following 
http://:/metrics output good?

statestore.backend.state.map:
{
  127.0.0.1:23000<http://127.0.0.1:23000/> : OK
}
statestore.live.backends:3
statestore.live.backends.list:[127.0.0.1:22000<http://127.0.0.1:22000/>]


Hi Steven -

This looks like your problem. Your machines are registering themselves with 
'localhost' as their hostname, and this means that they all look the same to 
the statestore.

I looked at Matt's zero-to-impala link - it's awesome, but now a little out of 
date. You should modify where you run impalad to also have --ipaddress and 
--hostname correctly set for each node. Then check the statestore metrics; 
things should look a lot better and your performance should improve.


2. My impalad logs contain "Unknown disk id.  This will negatively affect 
performance.  Check your hdfs settings to enable block location metadata." and 
my http://:/varz doesn't contain the string 
"dfs.datanode.hdfs-blocks-metadata.enabled". But my hdfs-site.xml sets 
dfs.datanode.hdfs-blocks-metadata.enabled to true. Why?

What version of CDH are you using?


3. My impalad.out doesn't contain "Unable to load native-hadoop library". This 
is good, I believe.

4. My impalad logs contain the following lines matching the word "scheduler", 
but none contains "locality percentage". Why?


The locality percentage is printed only for GLOG_v=1 - and I note that the 
setup-impala.sh script has  a typo where it has GVLOG_v=1. If you fix this, you 
should see the locality percentage.

Hope this helps - let us know if things improve.

Henry


/tmp/impalad.INFO:I0122 00:19:09.137197  5121 simple-scheduler.cc:82] Starting 
simple scheduler
/tmp/impalad.ip-10-170-17-154.impala.log.INFO.20130122-001901.5121:I0122 
00:19:09.137197  5121 simple-scheduler.cc:82] Starting simple scheduler

Thanks.
Steven


--





--
Henry Robinson
Software Engineer
Cloudera
415-994-6679