Bug when adding multiple partitions

2013-08-14 Thread Jan Dolinár
Hi everyone,

Consider following DDL:

CREATE TABLE partition_test
  (a INT)
PARTITIONED BY (b INT);

ALTER TABLE partition_test ADD
PARTITION (b=1) location '/tmp/test1'
PARTITION (b=2) location '/tmp/test2';

Now lets have a look what was created:

DESCRIBE EXTENDED partition_test PARTITION (b=1);
DESCRIBE EXTENDED partition_test PARTITION (b=2);

Both describe statements yield "location:hdfs://example.com:9000/tmp/test1",
which is obviously incorrect.

This behavior *is* mentioned on wiki[1], but the article speaks
specifically of hive 0.7. I just tested this in versions 0.7.1 and 0.10.0,
and they both exhibit this bug. I wasn't even able to find a JIRA for this
issue, was I looking wrong? Or should a new one be created?

Best regards,
Jan Dolinar

[1]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddPartitions


Hive cli Vs beeline cli

2013-08-14 Thread pandees waran
Hi  pros,
Based on your experience with beeline cli,  could you please share your
thoughts in advantages of using beeline cli over default hive  cli?
Please share if you find any useful link for this info.

Thanks
Pandeeswaran


Re: Does hiveserver2 support -e and -f options

2013-08-14 Thread Jarek Jarcec Cecho
HiveServer2 is a daemon running on background, so it logically do not support 
-e and -f parameter in the similar fashion as the utility "hive". However the 
beeline client that can be used for talking to HiveServer2 do support both of 
them.

Jarcec

On Tue, Aug 13, 2013 at 01:22:07AM +, Sanjay Subramanian wrote:
> 
> 
> 
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, use, disclosure or distribution is 
> prohibited. If you are not the intended recipient, please contact the sender 
> by reply email and destroy all copies of the original message along with any 
> attachments, from your computer system. If you are the intended recipient, 
> please be advised that the content of this message is subject to access, 
> review and disclosure by the sender's Email System Administrator.
> 


signature.asc
Description: Digital signature


Re: Building Hadoop/Hive App

2013-08-14 Thread Esteban Gutierrez
Hello Guillermo,

Sure, you can use the Thrift API to connect to Hive

https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-Python

cheers,
esteban.


--
Cloudera, Inc.



On Wed, Aug 14, 2013 at 3:45 PM, Guillermo Alvarado <
guillermoalvarad...@gmail.com> wrote:

> Hi everybody,
>
> I want to build an application with Hadoop. I want to know which is the
> best way to comunicate my GUI (Django based) with Hive.
>
> I have a CDH4 cluster running, I need to perform queries to hive then
> graph the response.
>
> I know there is a JDBC client and a python client but I need have
> installed hadoop in order to use it, is not there an API than I can consume
> without have installed hadoop in the GUI server?  or I need to build and
> expose an API from the Hive server (Using the clients mentioned above)that
> my GUI server will consume?
>
> So, What recommend me you guys?
>
> Thanks in advance,
> Regards.
>


Re: Review Request (wikidoc): LZO Compression in Hive

2013-08-14 Thread Sanjay Subramanian
Once again, I am down on my knees humbling calling upon the Hive Jedi Masters 
to please provide this paadwaan  with cwiki update privileges

May the Force be with u

Thanks

sanjay

From: Sanjay Subramanian 
mailto:sanjay.subraman...@wizecommerce.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, July 31, 2013 9:38 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Cc: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>
Subject: Re: Review Request (wikidoc): LZO Compression in Hive

Hi guys

Any chance I could get cwiki update privileges today ?

Thanks

sanjay

From: Sanjay Subramanian 
mailto:sanjay.subraman...@wizecommerce.com>>
Date: Tuesday, July 30, 2013 4:26 PM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Cc: "d...@hive.apache.org" 
mailto:d...@hive.apache.org>>
Subject: Review Request (wikidoc): LZO Compression in Hive

Hi

Met with Lefty this afternoon and she was kind to spend time to add my 
documentation to the site - since I still don't have editing privileges :-)

Please review the new wikidoc about LZO compression in the Hive language 
manual.  If anything is unclear or needs more information, you can email 
suggestions to this list or edit the wiki yourself (if you have editing 
privileges).  Here are the links:

  1.  Language 
Manual (new 
bullet under File Formats)
  2.  LZO 
Compression
  3.  CREATE 
TABLE
 (near end of section, pasted in here:)
Use STORED AS TEXTFILE if the data needs to be stored as plain text files. Use 
STORED AS SEQUENCEFILE if the data needs to be compressed. Please read more 
about 
CompressedStorage
 if you are planning to keep data compressed in your Hive tables. Use 
INPUTFORMAT and OUTPUTFORMAT to specify the name of a corresponding InputFormat 
and OutputFormat class as a string literal, e.g., 
'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. For 
LZO compression, the values to use are 'INPUTFORMAT 
"com.hadoop.mapred.DeprecatedLzoTextInputFormat" OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"' (see LZO 
Compression).

My cwiki id is
https://cwiki.apache.org/confluence/display/~sanjaysubraman...@yahoo.com
It will be great if I could get edit privileges

Thanks
sanjay

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: Hive and Lzo Compression

2013-08-14 Thread Sanjay Subramanian
I am not sure if in this cade data is loaded
OR partition  added with location specified (to some location in HDFS)

Yes u r stating the question correctly

sanjay

From: Nitin Pawar mailto:nitinpawar...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, August 14, 2013 10:54 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Hive and Lzo Compression

Please correct me if I understood the question correctly

You created a table def without mentioning a stored as clause
then you load data into table from a compressed a file
then do a select query and it still works
but how did it figured out which compression codec to use?

Am I stating it correctly ?



On Wed, Aug 14, 2013 at 11:11 PM, Sanjay Subramanian 
mailto:sanjay.subraman...@wizecommerce.com>>
 wrote:
That is really interesting…let me try and think of a reason…meanwhile any other 
LZO Hive Samurais out there ? Please help with some guidance

sanjay

From: w00t w00t mailto:w00...@yahoo.de>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, w00t w00t 
mailto:w00...@yahoo.de>>
Date: Wednesday, August 14, 2013 1:15 AM

To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Hive and Lzo Compression


Thanks for your reply.

The interesting thing I experience is that the SELECT query still works - even 
when I do not specify the STORED AS clause... that puzzles me a bit.


Von: Sanjay Subramanian 
mailto:sanjay.subraman...@wizecommerce.com>>
An: "user@hive.apache.org" 
mailto:user@hive.apache.org>>; w00t w00t 
mailto:w00...@yahoo.de>>
Gesendet: 3:44 Mittwoch, 14.August 2013
Betreff: Re: Hive and Lzo Compression

Hi

I think the CREATE TABLE without the STORED AS clause will not give any errors 
while creating the table.
However when you query that table and since that table contains .lzo files , 
you would  get errors.
With external tables , u r separating the table creation(definition) from the 
data. So only at the time of query of that table, hive might report errors.

LZO compression rocks ! I am so glad I used it in our projects here.

Regards

sanjay

From: w00t w00t mailto:w00...@yahoo.de>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, w00t w00t 
mailto:w00...@yahoo.de>>
Date: Tuesday, August 13, 2013 12:13 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Hive and Lzo Compression

Thanks for your replies and the link.

I could get it working, but wondered why the CREATE TABLE statement worked 
without the STORED AS Clause as well...that's what puzzles me a bit...

But I will use the STORED AS Clause to be on the safe side.



Von: Lefty Leverenz mailto:leftylever...@gmail.com>>
An: user@hive.apache.org
CC: w00t w00t mailto:w00...@yahoo.de>>
Gesendet: 19:06 Samstag, 10.August 2013
Betreff: Re: Hive and Lzo Compression

I'm not seeing any documentation link in Sanjay's message, so here it is again 
(in the Hive wiki's language manual):  
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO.


On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian 
mailto:sanjay.subraman...@wizecommerce.com>>
 wrote:
Please refer this documentation here
Let me know if u need more clarifications so that we can make this document 
better and complete

Thanks

sanjay

From: w00t w00t mailto:w00...@yahoo.de>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, w00t w00t 
mailto:w00...@yahoo.de>>
Date: Thursday, August 8, 2013 2:02 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Hive and Lzo Compression


Hello,

I am started to run Hive with Lzo compression on Hortonworks 1.2

I have managed to install/configure Lzo and  hive -e "set 
io.compression.codecs" shows me the Lzo Codecs:
io.compression.codecs=
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec,
org.apache.hadoop.io.compress.BZip2Codec

However, I have some questions where I would be happy if you could help me.

(1) CREATE TABLE statement

I read in different postings, that in the CREATE TABLE statement, I have to use 
the following STORAGE clause:

CREATE EXTERNAL TABLE txt_table_lzo (
   txt_line STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ''
STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/myuser/data/in/lzo_compressed';

It works withouth any problems now to execute SELECT statements on this table 
with Lzo data.

However I also created a table on the

Building Hadoop/Hive App

2013-08-14 Thread Guillermo Alvarado
Hi everybody,

I want to build an application with Hadoop. I want to know which is the
best way to comunicate my GUI (Django based) with Hive.

I have a CDH4 cluster running, I need to perform queries to hive then graph
the response.

I know there is a JDBC client and a python client but I need have installed
hadoop in order to use it, is not there an API than I can consume without
have installed hadoop in the GUI server?  or I need to build and expose an
API from the Hive server (Using the clients mentioned above)that my GUI
server will consume?

So, What recommend me you guys?

Thanks in advance,
Regards.


Re: Hive and Lzo Compression

2013-08-14 Thread Nitin Pawar
Please correct me if I understood the question correctly

You created a table def without mentioning a stored as clause
then you load data into table from a compressed a file
then do a select query and it still works
but how did it figured out which compression codec to use?

Am I stating it correctly ?



On Wed, Aug 14, 2013 at 11:11 PM, Sanjay Subramanian <
sanjay.subraman...@wizecommerce.com> wrote:

>  That is really interesting…let me try and think of a reason…meanwhile
> any other LZO Hive Samurais out there ? Please help with some guidance
>
>  sanjay
>
>   From: w00t w00t 
> Reply-To: "user@hive.apache.org" , w00t w00t <
> w00...@yahoo.de>
> Date: Wednesday, August 14, 2013 1:15 AM
>
> To: "user@hive.apache.org" 
> Subject: Re: Hive and Lzo Compression
>
>
>  Thanks for your reply.
>
>  The interesting thing I experience is that the SELECT query still works
> - even when I do not specify the STORED AS clause... that puzzles me a bit.
>
>   --
> *Von:* Sanjay Subramanian 
> *An:* "user@hive.apache.org" ; w00t w00t <
> w00...@yahoo.de>
> *Gesendet:* 3:44 Mittwoch, 14.August 2013
> *Betreff:* Re: Hive and Lzo Compression
>
>  Hi
>
>  I think the CREATE TABLE without the STORED AS clause will not give any
> errors while creating the table.
> However when you query that table and since that table contains .lzo files
> , you would  get errors.
> With external tables , u r separating the table creation(definition) from
> the data. So only at the time of query of that table, hive might report
> errors.
>
>  LZO compression rocks ! I am so glad I used it in our projects here.
>
>  Regards
>
>  sanjay
>
>   From: w00t w00t 
> Reply-To: "user@hive.apache.org" , w00t w00t <
> w00...@yahoo.de>
> Date: Tuesday, August 13, 2013 12:13 AM
> To: "user@hive.apache.org" 
> Subject: Re: Hive and Lzo Compression
>
>   Thanks for your replies and the link.
>
>  I could get it working, but wondered why the CREATE TABLE statement
> worked without the STORED AS Clause as well...that's what puzzles me a
> bit...
>
>  But I will use the STORED AS Clause to be on the safe side.
>
>
>   --
> *Von:* Lefty Leverenz 
> *An:* user@hive.apache.org
> *CC:* w00t w00t 
> *Gesendet:* 19:06 Samstag, 10.August 2013
> *Betreff:* Re: Hive and Lzo Compression
>
>  I'm not seeing any documentation link in Sanjay's message, so here it is
> again (in the Hive wiki's language manual):
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO.
>
>
> On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian <
> sanjay.subraman...@wizecommerce.com> wrote:
>
>  Please refer this documentation here
> Let me know if u need more clarifications so that we can make this
> document better and complete
>
>  Thanks
>
>  sanjay
>
>   From: w00t w00t 
> Reply-To: "user@hive.apache.org" , w00t w00t <
> w00...@yahoo.de>
> Date: Thursday, August 8, 2013 2:02 AM
> To: "user@hive.apache.org" 
> Subject: Hive and Lzo Compression
>
>
>Hello,
>
> I am started to run Hive with Lzo compression on Hortonworks 1.2
>
> I have managed to install/configure Lzo and  hive -e "set
> io.compression.codecs" shows me the Lzo Codecs:
> io.compression.codecs=
> org.apache.hadoop.io.compress.GzipCodec,
> org.apache.hadoop.io.compress.DefaultCodec,
> com.hadoop.compression.lzo.LzoCodec,
> com.hadoop.compression.lzo.LzopCodec,
> org.apache.hadoop.io.compress.BZip2Codec
>
> However, I have some questions where I would be happy if you could help me.
>
> (1) CREATE TABLE statement
>
>  I read in different postings, that in the CREATE TABLE statement, I have
> to use the following STORAGE clause:
>
>  CREATE EXTERNAL TABLE txt_table_lzo (
> txt_line STRING
>  )
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY ''
>  STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>  LOCATION '/user/myuser/data/in/lzo_compressed';
>
>  It works withouth any problems now to execute SELECT statements on this
> table with Lzo data.
>
>  However I also created a table on the same data without this STORAGE
> clause:
>
>  CREATE EXTERNAL TABLE txt_table_lzo_tst (
> txt_line STRING
>  )
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY ''
>  LOCATION '/user/myuser/data/in/lzo_compressed';
>
>  The interesting thing is, it works as well, when I execute a SELECT
> statement and this table.
>
>  Can you help, why the second CREATE TABLE statement works as well?
>  What should I use in DDLs?
>  Is it best practice to use the STORED AS clause with a
> "deprecatedLzoTextInputFormat"? Or should I remove it?
>
>
> (2) Output and Intermediate Compression Settings
>
>  I want to use output compression .
>
>  In "Programming Hive" from Capriolo, Wampler, Rutherglen the following
> commands are recommended:
>  SET hive.exec.compress.output=true;
>  SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
>
>   However, in some other

Re: Strange error in Hive - Insert INTO

2013-08-14 Thread Sanjay Subramanian
Another reason I can think of is possibly some STRING column in your table has 
a "DELIMITER" character…Like once in production I had tab spaces in the string 
and my table was also defined using TAB as delimiter

From: Stephen Sprague mailto:sprag...@gmail.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, August 14, 2013 8:43 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Strange error in Hive - Insert INTO

Hi Jerome,
That's a grandiose sql statement you got there! :)I find that if you break 
up those nested queries into simple CTAS (Create Table AS) statements and 
create a cascading effect of referring to the table in the previous step it 
makes debugging *so* much easier.  In other SQL dialects like DB2 this is 
facilitated by the WITH keyword. Maybe the Hive gurus will implement that some 
day.   But that's a topic for another day.

So all that said, i see that the columns in your create table statement don't 
match the columns in your outermost select statement.  In particular, DT_JOUR 
is listed as the 6th column in your create table statement but it appears to be 
the 2nd column in your select statement. So something looks fishy there.

My guess is ultimately you're missing a comma somewhere in the select list so 
hive is eating an column as a column alias and all your data is skewed over by 
one column. This happens not so infrequently since it is valid sql.

Long winded answer to a simple question. Apologies up front!


On Wed, Aug 14, 2013 at 5:35 AM, Jérôme Verdier 
mailto:verdier.jerom...@gmail.com>> wrote:
Hi everybody,

I faced a strange error in Hive today.

I have launch a hive script to make some calculations, joins, union, etc... and 
then insert these results in over hive table.

Everything is working fine (.hql is working, full ok, data are imported), but 
one field (CO_RGRP_PRODUITS) is very strange.

after the insert, CO_RGRP_PRODUITS is looking like a TIMESTAMP (1970-01-01 
01:00:00) instead of being a simple STRING.

I precise that source field are simple string like this  : 0101380,  for example

What is going wrong here.

You can find my script below (create table and .hql insert/calculations)

Thanks for your help.


INSERT SCRIPT :
--THM_CA_RGRP_PRODUITS_JOUR
CREATE TABLE default.THM_CA_RGRP_PRODUITS_JOUR (
CO_SOCIETE BIGINT,
TYPE_ENTITE STRING,
CODE_ENTITE STRING,
TYPE_RGRP_PRODUITS STRING,
CO_RGRP_PRODUITS STRING,
DT_JOUR TIMESTAMP,
MT_CA_NET_TTC FLOAT,
MT_OBJ_CA_NET_TTC FLOAT,
NB_CLIENTS FLOAT,
MT_CA_NET_TTC_COMP FLOAT,
MT_OBJ_CA_NET_TTC_COMP FLOAT,
NB_CLIENTS_COMP FLOAT);

INSERT SCRIPT :

INSERT INTO TABLE THM_CA_RGRP_PRODUITS_JOUR

  SELECT
  1  as CO_SOCIETE,-- A 
modifier => variable
  '2013-01-02 00:00:00.0' as 
DT_JOUR, -- A modifier => variable
  'MAG'   as 
TYPE_ENTITE,
  m.co_magasinas 
CODE_ENTITE,
  'FAM'   as 
TYPE_RGRP_PRODUITS,
  sourceunion.CO_RGRP_PRODUITSas 
CO_RGRP_PRODUITS,
  SUM(MT_CA_NET_TTC)  as 
MT_CA_NET_TTC,
  SUM(MT_OBJ_CA_NET_TTC)  as 
MT_OBJ_CA_NET_TTC,
  SUM(NB_CLIENTS) as 
NB_CLIENTS,
  SUM(MT_CA_NET_TTC_COMP) as 
MT_CA_NET_TTC_COMP,
  SUM(MT_OBJ_CA_NET_TTC_COMP) as 
MT_OBJ_CA_NET_TTC_COMP,
  SUM(NB_CLIENTS_COMP)as 
NB_CLIENTS_COMP

FROM (
  SELECT
mtransf.id_mag_transfere as ID_MAGASIN,
v.co_famille as CO_RGRP_PRODUITS,
sum(v.mt_ca_net_ttc) as MT_CA_NET_TTC,
0as MT_OBJ_CA_NET_TTC,
0as NB_CLIENTS,
sum(v.mt_ca_net_ttc * (CASE WHEN mtransf.flag_mag_comp = 'NC' THEN 
0 ELSE 1 END))
 as MT_CA_NET_TTC_COMP,
0as MT_OBJ_CA_NET_TTC_COMP,
0as NB_CLIENTS_COMP
  FROM default.VENTES_FAM v
  JOIN default.kpi_magasin mtransf
  ON  mtransf.co_societe = CASE WHEN v.co_societe = 1 THEN 1 ELSE 2 END
  AND mtransf.id_magasin = v.id_magasin
  WHERE
  mtransf.co_societe= 1 -- Modifier variable
  AND v.dt_jour = '2013-01-02 00:00:00.0' -- Modifier 
variable
  GROUP BY
mtransf.id

Re: Hive and Lzo Compression

2013-08-14 Thread Sanjay Subramanian
That is really interesting…let me try and think of a reason…meanwhile any other 
LZO Hive Samurais out there ? Please help with some guidance

sanjay

From: w00t w00t mailto:w00...@yahoo.de>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, w00t w00t 
mailto:w00...@yahoo.de>>
Date: Wednesday, August 14, 2013 1:15 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Hive and Lzo Compression


Thanks for your reply.

The interesting thing I experience is that the SELECT query still works - even 
when I do not specify the STORED AS clause... that puzzles me a bit.


Von: Sanjay Subramanian 
mailto:sanjay.subraman...@wizecommerce.com>>
An: "user@hive.apache.org" 
mailto:user@hive.apache.org>>; w00t w00t 
mailto:w00...@yahoo.de>>
Gesendet: 3:44 Mittwoch, 14.August 2013
Betreff: Re: Hive and Lzo Compression

Hi

I think the CREATE TABLE without the STORED AS clause will not give any errors 
while creating the table.
However when you query that table and since that table contains .lzo files , 
you would  get errors.
With external tables , u r separating the table creation(definition) from the 
data. So only at the time of query of that table, hive might report errors.

LZO compression rocks ! I am so glad I used it in our projects here.

Regards

sanjay

From: w00t w00t mailto:w00...@yahoo.de>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, w00t w00t 
mailto:w00...@yahoo.de>>
Date: Tuesday, August 13, 2013 12:13 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Re: Hive and Lzo Compression

Thanks for your replies and the link.

I could get it working, but wondered why the CREATE TABLE statement worked 
without the STORED AS Clause as well...that's what puzzles me a bit...

But I will use the STORED AS Clause to be on the safe side.



Von: Lefty Leverenz mailto:leftylever...@gmail.com>>
An: user@hive.apache.org
CC: w00t w00t mailto:w00...@yahoo.de>>
Gesendet: 19:06 Samstag, 10.August 2013
Betreff: Re: Hive and Lzo Compression

I'm not seeing any documentation link in Sanjay's message, so here it is again 
(in the Hive wiki's language manual):  
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO.


On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian 
mailto:sanjay.subraman...@wizecommerce.com>>
 wrote:
Please refer this documentation here
Let me know if u need more clarifications so that we can make this document 
better and complete

Thanks

sanjay

From: w00t w00t mailto:w00...@yahoo.de>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>, w00t w00t 
mailto:w00...@yahoo.de>>
Date: Thursday, August 8, 2013 2:02 AM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: Hive and Lzo Compression


Hello,

I am started to run Hive with Lzo compression on Hortonworks 1.2

I have managed to install/configure Lzo and  hive -e "set 
io.compression.codecs" shows me the Lzo Codecs:
io.compression.codecs=
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec,
org.apache.hadoop.io.compress.BZip2Codec

However, I have some questions where I would be happy if you could help me.

(1) CREATE TABLE statement

I read in different postings, that in the CREATE TABLE statement, I have to use 
the following STORAGE clause:

CREATE EXTERNAL TABLE txt_table_lzo (
   txt_line STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ''
STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/myuser/data/in/lzo_compressed';

It works withouth any problems now to execute SELECT statements on this table 
with Lzo data.

However I also created a table on the same data without this STORAGE clause:

CREATE EXTERNAL TABLE txt_table_lzo_tst (
   txt_line STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ''
LOCATION '/user/myuser/data/in/lzo_compressed';

The interesting thing is, it works as well, when I execute a SELECT statement 
and this table.

Can you help, why the second CREATE TABLE statement works as well?
What should I use in DDLs?
Is it best practice to use the STORED AS clause with a 
"deprecatedLzoTextInputFormat"? Or should I remove it?


(2) Output and Intermediate Compression Settings

I want to use output compression .

In "Programming Hive" from Capriolo, Wampler, Rutherglen the following commands 
are recommended:
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;

  However, in some other places in forums, I found the following 
recommended setting

Re: Strange error in Hive - Insert INTO

2013-08-14 Thread Stephen Sprague
Hi Jerome,
That's a grandiose sql statement you got there! :)I find that if you
break up those nested queries into simple CTAS (Create Table AS) statements
and create a cascading effect of referring to the table in the previous
step it makes debugging *so* much easier.  In other SQL dialects like DB2
this is facilitated by the WITH keyword. Maybe the Hive gurus will
implement that some day.   But that's a topic for another day.

So all that said, i see that the columns in your create table statement
don't match the columns in your outermost select statement.  In particular,
DT_JOUR is listed as the 6th column in your create table statement but it
appears to be the 2nd column in your select statement. So something looks
fishy there.

My guess is ultimately you're missing a comma somewhere in the select list
so hive is eating an column as a column alias and all your data is skewed
over by one column. This happens not so infrequently since it is valid sql.

Long winded answer to a simple question. Apologies up front!


On Wed, Aug 14, 2013 at 5:35 AM, Jérôme Verdier
wrote:

> Hi everybody,
>
> I faced a strange error in Hive today.
>
> I have launch a hive script to make some calculations, joins, union,
> etc... and then insert these results in over hive table.
>
> Everything is working fine (.hql is working, full ok, data are imported),
> but one field (CO_RGRP_PRODUITS) is very strange.
>
> after the insert, CO_RGRP_PRODUITS is looking like a TIMESTAMP (1970-01-01
> 01:00:00) instead of being a simple STRING.
>
> I precise that source field are simple string like this  : 0101380,  for
> example
>
> What is going wrong here.
>
> You can find my script below (create table and .hql insert/calculations)
>
> Thanks for your help.
>
>
> INSERT SCRIPT :
> --THM_CA_RGRP_PRODUITS_JOUR
> CREATE TABLE default.THM_CA_RGRP_PRODUITS_JOUR (
> CO_SOCIETE BIGINT,
> TYPE_ENTITE STRING,
> CODE_ENTITE STRING,
> TYPE_RGRP_PRODUITS STRING,
> CO_RGRP_PRODUITS STRING,
> DT_JOUR TIMESTAMP,
> MT_CA_NET_TTC FLOAT,
> MT_OBJ_CA_NET_TTC FLOAT,
> NB_CLIENTS FLOAT,
> MT_CA_NET_TTC_COMP FLOAT,
> MT_OBJ_CA_NET_TTC_COMP FLOAT,
> NB_CLIENTS_COMP FLOAT);
>
> INSERT SCRIPT :
>
> INSERT INTO TABLE THM_CA_RGRP_PRODUITS_JOUR
>
>   SELECT
>   1  as
> CO_SOCIETE,-- A modifier => variable
>   '2013-01-02 00:00:00.0' as
> DT_JOUR, -- A modifier => variable
>   'MAG'   as
> TYPE_ENTITE,
>   m.co_magasinas
> CODE_ENTITE,
>   'FAM'   as
> TYPE_RGRP_PRODUITS,
>   sourceunion.CO_RGRP_PRODUITSas
> CO_RGRP_PRODUITS,
>   SUM(MT_CA_NET_TTC)  as
> MT_CA_NET_TTC,
>   SUM(MT_OBJ_CA_NET_TTC)  as
> MT_OBJ_CA_NET_TTC,
>   SUM(NB_CLIENTS) as
> NB_CLIENTS,
>   SUM(MT_CA_NET_TTC_COMP) as
> MT_CA_NET_TTC_COMP,
>   SUM(MT_OBJ_CA_NET_TTC_COMP) as
> MT_OBJ_CA_NET_TTC_COMP,
>   SUM(NB_CLIENTS_COMP)as
> NB_CLIENTS_COMP
>
> FROM (
>   SELECT
> mtransf.id_mag_transfere as
> ID_MAGASIN,
> v.co_famille as CO_RGRP_PRODUITS,
> sum(v.mt_ca_net_ttc) as MT_CA_NET_TTC,
> 0as MT_OBJ_CA_NET_TTC,
> 0as NB_CLIENTS,
> sum(v.mt_ca_net_ttc * (CASE WHEN mtransf.flag_mag_comp = 'NC'
> THEN 0 ELSE 1 END))
>  as MT_CA_NET_TTC_COMP,
> 0as MT_OBJ_CA_NET_TTC_COMP,
> 0as NB_CLIENTS_COMP
>   FROM default.VENTES_FAM v
>   JOIN default.kpi_magasin mtransf
>   ON  mtransf.co_societe = CASE WHEN v.co_societe = 1 THEN 1 ELSE
> 2 END
>   AND mtransf.id_magasin = v.id_magasin
>   WHERE
>   mtransf.co_societe= 1 -- Modifier variable
>   AND v.dt_jour = '2013-01-02 00:00:00.0' -- Modifier
> variable
>   GROUP BY
> mtransf.id_mag_transfere,
> v.co_famille
>
>   UNION ALL
>
>   SELECT
> mtransf.id_mag_transfere as ID_MAGASIN,
> v.co_famille as CO_RGRP_PRODUITS,
> 0as MT_CA_NET_TTC,
> 0as MT_OBJ_CA_NET_TTC,
> sum(nb_client)   as NB_CLIENTS,
>   

Strange error in Hive - Insert INTO

2013-08-14 Thread Jérôme Verdier
Hi everybody,

I faced a strange error in Hive today.

I have launch a hive script to make some calculations, joins, union, etc...
and then insert these results in over hive table.

Everything is working fine (.hql is working, full ok, data are imported),
but one field (CO_RGRP_PRODUITS) is very strange.

after the insert, CO_RGRP_PRODUITS is looking like a TIMESTAMP (1970-01-01
01:00:00) instead of being a simple STRING.

I precise that source field are simple string like this  : 0101380,  for
example

What is going wrong here.

You can find my script below (create table and .hql insert/calculations)

Thanks for your help.


INSERT SCRIPT :
--THM_CA_RGRP_PRODUITS_JOUR
CREATE TABLE default.THM_CA_RGRP_PRODUITS_JOUR (
CO_SOCIETE BIGINT,
TYPE_ENTITE STRING,
CODE_ENTITE STRING,
TYPE_RGRP_PRODUITS STRING,
CO_RGRP_PRODUITS STRING,
DT_JOUR TIMESTAMP,
MT_CA_NET_TTC FLOAT,
MT_OBJ_CA_NET_TTC FLOAT,
NB_CLIENTS FLOAT,
MT_CA_NET_TTC_COMP FLOAT,
MT_OBJ_CA_NET_TTC_COMP FLOAT,
NB_CLIENTS_COMP FLOAT);

INSERT SCRIPT :

INSERT INTO TABLE THM_CA_RGRP_PRODUITS_JOUR

  SELECT
  1  as
CO_SOCIETE,-- A modifier => variable
  '2013-01-02 00:00:00.0' as
DT_JOUR, -- A modifier => variable
  'MAG'   as
TYPE_ENTITE,
  m.co_magasinas
CODE_ENTITE,
  'FAM'   as
TYPE_RGRP_PRODUITS,
  sourceunion.CO_RGRP_PRODUITSas
CO_RGRP_PRODUITS,
  SUM(MT_CA_NET_TTC)  as
MT_CA_NET_TTC,
  SUM(MT_OBJ_CA_NET_TTC)  as
MT_OBJ_CA_NET_TTC,
  SUM(NB_CLIENTS) as
NB_CLIENTS,
  SUM(MT_CA_NET_TTC_COMP) as
MT_CA_NET_TTC_COMP,
  SUM(MT_OBJ_CA_NET_TTC_COMP) as
MT_OBJ_CA_NET_TTC_COMP,
  SUM(NB_CLIENTS_COMP)as
NB_CLIENTS_COMP

FROM (
  SELECT
mtransf.id_mag_transfere as
ID_MAGASIN,
v.co_famille as CO_RGRP_PRODUITS,
sum(v.mt_ca_net_ttc) as MT_CA_NET_TTC,
0as MT_OBJ_CA_NET_TTC,
0as NB_CLIENTS,
sum(v.mt_ca_net_ttc * (CASE WHEN mtransf.flag_mag_comp = 'NC'
THEN 0 ELSE 1 END))
 as MT_CA_NET_TTC_COMP,
0as MT_OBJ_CA_NET_TTC_COMP,
0as NB_CLIENTS_COMP
  FROM default.VENTES_FAM v
  JOIN default.kpi_magasin mtransf
  ON  mtransf.co_societe = CASE WHEN v.co_societe = 1 THEN 1 ELSE 2
END
  AND mtransf.id_magasin = v.id_magasin
  WHERE
  mtransf.co_societe= 1 -- Modifier variable
  AND v.dt_jour = '2013-01-02 00:00:00.0' -- Modifier
variable
  GROUP BY
mtransf.id_mag_transfere,
v.co_famille

  UNION ALL

  SELECT
mtransf.id_mag_transfere as ID_MAGASIN,
v.co_famille as CO_RGRP_PRODUITS,
0as MT_CA_NET_TTC,
0as MT_OBJ_CA_NET_TTC,
sum(nb_client)   as NB_CLIENTS,
0as MT_CA_NET_TTC_COMP,
0as MT_OBJ_CA_NET_TTC_COMP,
sum(nb_client * (CASE WHEN mtransf.flag_mag_comp = 'NC' THEN 0
ELSE 1 END))
 as NB_CLIENTS_COMP
  FROM default.nb_clients_mag_fam_j v
  JOIN default.kpi_magasin mtransf
  ON  mtransf.co_societe = CASE WHEN v.co_societe = 1 THEN 1 ELSE 2
END
  AND mtransf.id_magasin = v.id_magasin
  WHERE
  mtransf.co_societe= 1 -- A modifier
  AND v.dt_jour = '2013-01-02 00:00:00.0'
  GROUP BY
mtransf.id_mag_transfere,
v.co_famille
  ) sourceunion
JOIN default.kpi_magasin m
ON  m.co_societe = 1 -- A modifier
AND m.id_magasin = sourceunion.id_magasin
GROUP BY
  m.co_magasin,
  sourceunion.CO_RGRP_PRODUITS;


Re: Hive and Lzo Compression

2013-08-14 Thread w00t w00t


Thanks for your reply.

The interesting thing I experience is that the SELECT query still works - even 
when I do not specify the STORED AS clause... that puzzles me a bit.




 Von: Sanjay Subramanian 
An: "user@hive.apache.org" ; w00t w00t  
Gesendet: 3:44 Mittwoch, 14.August 2013
Betreff: Re: Hive and Lzo Compression
 


Hi 

I think the CREATE TABLE without the STORED AS clause will not give any errors 
while creating the table.
However when you query that table and since that table contains .lzo files , 
you would  get errors. 
With external tables , u r separating the table creation(definition) from the 
data. So only at the time of query of that table, hive might report errors.

LZO compression rocks ! I am so glad I used it in our projects here.

Regards

sanjay 
From: w00t w00t 
Reply-To: "user@hive.apache.org" , w00t w00t 

Date: Tuesday, August 13, 2013 12:13 AM
To: "user@hive.apache.org" 
Subject: Re: Hive and Lzo Compression


Thanks for your replies and the link.

I could get it working, but wondered why the CREATE TABLE statement worked 
without the STORED AS Clause as well...that's what puzzles me a bit...

But I will use the STORED AS Clause to be on the safe side.





 Von: Lefty Leverenz 
An: user@hive.apache.org 
CC: w00t w00t  
Gesendet: 19:06 Samstag, 10.August 2013
Betreff: Re: Hive and Lzo Compression



I'm not seeing any documentation link in Sanjay's message, so here it is again 
(in the Hive wiki's language manual):  
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO. 



On Thu, Aug 8, 2013 at 3:30 PM, Sanjay Subramanian 
 wrote:

Please refer this documentation here
>Let me know if u need more clarifications so that we can make this document 
>better and complete
>
>
>Thanks
>
>
>sanjay
>
>From: w00t w00t 
>Reply-To: "user@hive.apache.org" , w00t w00t 
>
>Date: Thursday, August 8, 2013 2:02 AM
>To: "user@hive.apache.org" 
>Subject: Hive and Lzo Compression
>
>
>
>
>
>Hello,
> 
>I am started to run Hive with Lzo compression on Hortonworks 1.2
> 
>I have managed to install/configure Lzo and  hive -e "set 
>io.compression.codecs" shows me the Lzo Codecs:
>io.compression.codecs=
>org.apache.hadoop.io.compress.GzipCodec,
>org.apache.hadoop.io.compress.DefaultCodec,
>com.hadoop.compression.lzo.LzoCodec,
>com.hadoop.compression.lzo.LzopCodec,
>org.apache.hadoop.io.compress.BZip2Codec
> 
>However, I have some questions where I would be happy if you could help me.
>(1) CREATE TABLE statement
>
>
>I read in different postings, that in the CREATE TABLE statement, I have to 
>use the following STORAGE clause:
> 
>CREATE EXTERNAL TABLE txt_table_lzo (
>   txt_line STRING
>)
>ROW FORMAT DELIMITED FIELDS TERMINATED BY ''
>STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
>OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>LOCATION '/user/myuser/data/in/lzo_compressed';
> 
>It works withouth any problems now to execute SELECT statements on this table 
>with Lzo data.
> 
>However I also created a table on the same data without this STORAGE clause:
> 
>CREATE EXTERNAL TABLE txt_table_lzo_tst (
>   txt_line STRING
>)
>ROW FORMAT DELIMITED FIELDS TERMINATED BY ''
>LOCATION '/user/myuser/data/in/lzo_compressed';
> 
>The interesting thing is, it works as well, when I execute a SELECT statement 
>and this table.
> 
>Can you help, why the second CREATE TABLE statement works as well?
>What should I use in DDLs? 
>Is it best practice to use the STORED AS clause with a 
>"deprecatedLzoTextInputFormat"? Or should I remove it?
> 
> (2) Output and Intermediate Compression Settings 
> 
>I want to use output compression .
> 
>In "Programming Hive" from Capriolo, Wampler, Rutherglen the following 
>commands are recommended:
>SET hive.exec.compress.output=true;
>SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
> 
>   However, in some other places in forums, I found the following 
>recommended settings:
>SET hive.exec.compress.output=true
>SET mapreduce.output.fileoutputformat.compress=true
>SET 
>mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec
> 
>Am I right, that the first settings are for Hadoop versions prior 0.23?
>Or is there any other reason why the settings are different?
> 
>I am using Hadoop 1.1.2 with Hive 0.10.0.
>Which settings would you recommend to use?
> 
>--
>   I also want to compress intermediate results.
>  
>  Again, in  "Programming Hive" the following settings are recommended:
>  SET hive.exec.compress.intermediate=true;
>  SET 
>mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
> 
>   Is this the right setting?
>
>  Or should I again use the settings (which look more valid for Hadoop 
>0.23 and greater)?:
>  SET hive.exec.compress.intermediate=true;
>   SET 
>mapreduce.map.ou