Pls Help me - Hive Kerberos Issue

2017-01-25 Thread Ricardo Fajardo
Hello,


Please I need your help with the Kerberos authentication with Hive.


I am following this guide:

https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_sg_hiveserver2_security.html#topic_9_1_1


But I am getting this error:

Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism 
level: Failed to find any Kerberos tgt)


I have a remote Kerberos server and I can generate a token with kinit for my 
user. I created a keytab file with my passwd for my user. Please tell me if it 
is ok.

On the another hand when I am debugging the hive code the operative system user 
is authenticated but I need authenticate my Kerberos user, can you tell me how 
I can achieve that? How can I store my tickets where Hive can load it?? or How 
can I verify where Hive is searching the tickets and what Hive is reading??

Thanks so much for your help.

Best regards,
Ricardo.




Re: Parquet tables with snappy compression

2017-01-25 Thread Gopal Vijayaraghavan

> Has there been any study of how much compressing Hive Parquet tables with 
> snappy reduces storage space or simply the table size in quantitative terms?

http://www.slideshare.net/oom65/file-format-benchmarks-avro-json-orc-parquet/20

Since SNAPPY is just LZ77, I would assume it would be useful in cases of 
Parquet leaves containing text with large common sub-chunks (like URLs or log 
data).

If you want to experiment with that corner case, the L_COMMENT field from TPC-H 
lineitem is a good compression-thrasher.

Cheers,
Gopal




Re: Parquet tables with snappy compression

2017-01-25 Thread Owen O'Malley
Mich,
   Here are the benchmarks that I did using three different types of data:

http://www.slideshare.net/HadoopSummit/file-format-benchmark-avro-json-orc-parquet

I assume you are comparing parquet-snappy vs parquet-none.

.. Owen


On Wed, Jan 25, 2017 at 1:37 PM, Mich Talebzadeh 
wrote:

> Hi,
>
> Has there been any study of how much compressing Hive Parquet tables with
> snappy reduces storage space or simply the table size in quantitative terms?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Parquet tables with snappy compression

2017-01-25 Thread Mich Talebzadeh
Hi,

Has there been any study of how much compressing Hive Parquet tables with
snappy reduces storage space or simply the table size in quantitative terms?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Re: Only External tables can have an explicit location

2017-01-25 Thread Gopal Vijayaraghavan
> Error 40003]: Only External tables can have an explicit location
…
> using hive 1.2. I got this error. This was definitely not a requirement 
> before 

Are you using Apache hive or some vendor fork?

Some BI engines demand there be no aliasing for tables, so each table needs a 
unique location to avoid schema issues.

I tested this on master & HDP builds, the vanilla Apache doesn't have this 
restriction (if there is, an explicit version would be helpful).

$ hive --version
Hive 1.2.1000.2.5.3.0-37

hive> use testing;
OK
Time taken: 1.245 seconds
hive> create table foo (x int) location '/tmp/foo';
OK
Time taken: 0.59 seconds
hive> desc formatted foo;
OK
# col_name  data_type   comment 
 
x   int 
 
# Detailed Table Information 
Database:   testing  
Owner:  gopal
CreateTime: Wed Jan 25 15:38:27 EST 2017 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   hdfs://nnha/tmp/foo  
Table Type: MANAGED_TABLE
Table Parameters:
numFiles2   
totalSize   66  
transient_lastDdlTime   1485376707  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
 
InputFormat:org.apache.hadoop.mapred.TextInputFormat 
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
serialization.format1   
Time taken: 0.455 seconds, Fetched: 28 row(s)
hive> 

Cheers,
Gopal





RE: Hive table for a single file: CREATE/ALTER TABLE differences

2017-01-25 Thread Markovitz, Dudu
Wow. This is gold.

Dudu

From: Dmitry Tolpeko [mailto:dmtolp...@gmail.com]
Sent: Wednesday, January 25, 2017 6:47 PM
To: user@hive.apache.org
Subject: Hive table for a single file: CREATE/ALTER TABLE differences

​
​I accidentally noticed​​ one feature:
(it is well know
​n​
that in CREATE TABLE you must specify a directory for the table LOCATION 
otherwise you get: "Can't make directory for path 's3n://dir/file' since it is 
a file.")

But at the same time, ALTER TABLE SET LOCATION 's3n://dir/file' works fine.
​ SELECT also reads data from the single file only. ​

I see this in Hive 1.0.0-amzn-4.

Is this just a bug and
​will
 be fixed
​ some day​
(or maybe already fixed) or it is for some reason, and will stay?

Thanks,
Dmitry



Hive 1.2.1 ODBC Driver Recommendation?

2017-01-25 Thread Lavelle, Shawn
Does anyone here have a recommendation for a windows ODBC driver that will work 
with Hive 1.2.1?

Thanks,

~ Shawn


[cid:imagedc5393.GIF@aa961dfa.4bb670d3]

Shawn Lavelle
Software Development

4101 Arrowhead Drive
Medina, Minnesota 55340-9457
Phone: 763 551 0559
Fax: 763 551 0750
Email: shawn.lave...@osii.com
Website: www.osii.com



Hive table for a single file: CREATE/ALTER TABLE differences

2017-01-25 Thread Dmitry Tolpeko
​
​I accidentally noticed​​ one feature:
(it is well know
​n​
that in CREATE TABLE you must specify a directory for the table LOCATION
otherwise you get: "Can't make directory for path 's3n://dir/file' since it
is a file.")

But at the same time, ALTER TABLE SET LOCATION 's3n://dir/file' works fine.
​ SELECT also reads data from the single file only. ​

I see this in Hive 1.0.0-amzn-4.

Is this just a bug and
​will
 be fixed
​ some day​
(or maybe already fixed) or it is for some reason, and will stay?

Thanks,
Dmitry


Only External tables can have an explicit location

2017-01-25 Thread Edward Capriolo
Error 40003]: Only External tables can have an explicit location

using hive 1.2. I got this error. This was definitely not a requirement
before

Why way this added? External table ONLY used to be dropping the table will
not drop the physical files.


Scaling HCatalog REST API

2017-01-25 Thread Ahmed Kamal Abdelfatah
Hi folks,

I’m working on some workflow that needs to hit an API for fetching hive table 
schema.

Currently, I’m using HCatalog Tempelton API
"apiURL:50111/templeton/v1/ddl/database/dbName/table/tableName?user.name=hive” 
but as the rate of requests is increasing (currently the max is 10 per second)

I get a lot of timeout from the API. I was looking to know how I can scale this 
API to be able to get better performance instead of increasing the timeout of 
course. Are there any params that needs tuning ? Should this be done on the 
metastore layer ? Thanks a lot.


Regards,


Ahmed Kamal
MTS Engineer in Data Science
Email: ahmed.abdelfa...@careem.com