metastore security issue

2013-07-03 Thread Shunichi Otsuka
I am trying to setup hive securely doing authorization at the metastore. 
However there is a problem.
I have relied on hive JIRA HIVE-3705 to decide the configuration which were set 
as below:

javax.jdo.option.ConnectionURLjdbc
javax.jdo.option.ConnectionDriverName java.database.jdbc.mysql
javax.jdo.option.ConnectionUserName   hive
javax.jdo.option.ConnectionPassword   userpass
hive.metastore.execute.setugi true
hive.metastore.uris   
thrift://thriftserver.example.com:9083
hive.metastore.sasl.enabled   true
hive.metastore.kerberos.keytab.file   /etc/grid-keytabs/hive.keytab
hive.metastore.kerberos.principal 
hive/thriftserver.example@example.com
hive.security.metastore.authorization.enabled true
hive.security.metastore.authenticator.manager 
org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator
hive.security.metastore.authorization.manager 
org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider
hive.security.authorization.enabled   false


However this does authorize an unauthorized user to drop a table or database 
from the metastore as below:

alice> create database db1 location '/user/alice/warehouse/db1.db';
[The permission of db1.db is drwx-- alice:users]
However,
bob> drop database db1;
OK

This should not happen, so why is it happening? Is my setting wrong or is it 
that the code has not covered this case?
If it is that it has not been implemented yet, what measures have you taken to 
avoid malicious users from dropping other users' database/tables?

Java version  is 1.6.0_33
hive version is 0.11

Thanks


Re: Fetching Results from Hive Select (JDBC ResultSet.next() vs HiveClient.fetchN())

2013-07-03 Thread Navis류승우
It seemed stmt.setFetchSize(1); can be called before execution
(without casting)

2013/7/3 Christian Schneider :
> Hi, i browsed through the sources and found a way to tune the JDBC
> ResultSet.next() performance.
>
> final Connection con =
> DriverManager.getConnection("jdbc:hive2://carolin:1/default", "hive",
> "");
> final Statement stmt = con.createStatement();
> final String tableName = "bigdata";
>
> sql = "select * from " + tableName + " limit 15";
> System.out.println("Running: " + sql);
> res = stmt.executeQuery(sql);
>
> // enlarge the FetchSize (default is just 50!)
> ((HiveQueryResultSet) res).setFetchSize(1);
>
> Best Regards,
> Christian.
>
>
> 2013/6/26 Christian Schneider 
>>
>> I just test the same statement with beeline and got the same bad
>> performance.
>>
>> Any ideas?
>>
>> Best Regards,
>> Chrisitan.
>>
>>
>> 2013/6/26 Christian Schneider 
>>>
>>> Hi,
>>> currently we are using HiveSever1 with the native HiveClient interface.
>>> Our application design looks horrible because (for whatever reason) it
>>> spawns a dedicated HiveServer for every query.
>>>
>>> We thought it is a good idea to switch to HiveServer2 (because the
>>> MetaStore get used by many different applications).
>>>
>>> The JDBC setup was straight forward, but the performance is not what we
>>> assumed.
>>>
>>> If we fetch a large result set (with fetchN()  over HiveClient) we read
>>> with around 10MB/s.
>>>
>>> If I use JDBC (with resultSet.next() ) i have a throughput from 1MB/min.
>>>
>>> Any chance to speed this up (like bulk fetching)?
>>>
>>> Best Regards,
>>> Christian.
>>
>>
>


Re: One query works the other does notŠany clues ?

2013-07-03 Thread Sanjay Subramanian
For the time being I have added  the create HDFS dir in my hive script…got to 
keep moving on….cant wait for ideal solution :-) but would love to know the 
ideal solution !

!hdfs dfs -mkdir 
/user/beeswax/warehouse/impressions_hive_stats/outpdir_impressions_header/2013-07-01/record_counts
;
INSERT OVERWRITE DIRECTORY 
'/user/beeswax/warehouse/impressions_hive_stats/outpdir_impressions_header/2013-07-01/record_counts'
SELECT
 'outpdir_impressions_header',
 '2013-07-01',
 'record_counts',
 'all_servers',
 count(*)
FROM
 outpdir_impressions_header
WHERE
 header_date_partition='2013-07-01'
;

From: Sanjay Subramanian 
mailto:sanjay.subraman...@wizecommerce.com>>
Reply-To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Date: Wednesday, July 3, 2013 2:40 PM
To: "user@hive.apache.org" 
mailto:user@hive.apache.org>>
Subject: One query works the other does notŠany clues ?

THIS FAILS
=
INSERT OVERWRITE DIRECTORY 
'/user/beeswax/warehouse/impressions_hive_stats/outpdir_impressions_header/2013-07-01/record_counts'
 select 'outpdir_impressions_header', '2013-07-01', 'record_counts', 
'all_servers', count(*) from outpdir_impressions_header where 
header_date_partition='2013-07-01'

13/07/03 14:28:31 ERROR exec.Task: Failed with exception Unable to rename: 
hdfs://thv-nn1.pv.sv.nextag.com:8020/tmp/hive-nextag/hive_2013-07-03_14-25-39_324_3631745157890476605/-ext-1
 to: 
/user/beeswax/warehouse/impressions_hive_stats/outpdir_impressions_header/2013-07-01/record_counts
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename: 
hdfs://thv-nn1.pv.sv.nextag.com:8020/tmp/hive-nextag/hive_2013-07-03_14-25-39_324_3631745157890476605/-ext-1
 to: 
/user/beeswax/warehouse/impressions_hive_stats/outpdir_impressions_header/2013-07-01/record_counts
at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:95)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:159)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:695)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask



THIS SUCCEEDS
=
INSERT OVERWRITE DIRECTORY 
'/user/beeswax/warehouse/impressions_hive_stats/outpdir_impressions_header/record_counts/2013-07-01'
 select 'outpdir_impressions_header', '2013-07-01', 'record_counts', 
'all_servers', count(*) from outpdir_impressions_header where 
header_date_partition='2013-07-01'

TABLE DEFINITION

CREATE EXTERNAL TABLE  IF NOT EXISTS impressions_hive_stats(table_name STRING, 
aggregation_date STRING , metric_name STRING, metric_key STRING, metric_value 
BIGINT) PARTITIONED BY (table_name_partition STRING, aggregation_date_partition 
STRING , metric_name_partition STRING, metric_key_partition STRING)  STORED AS 
INPUTFORMAT  \"com.hadoop.mapred.DeprecatedLzoTextInputFormat\"   OUTPUTFORMAT 
\"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\" ;

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any 

One query works the other does notŠany clues ?

2013-07-03 Thread Sanjay Subramanian
THIS FAILS
=
INSERT OVERWRITE DIRECTORY 
'/user/beeswax/warehouse/impressions_hive_stats/outpdir_impressions_header/2013-07-01/record_counts'
 select 'outpdir_impressions_header', '2013-07-01', 'record_counts', 
'all_servers', count(*) from outpdir_impressions_header where 
header_date_partition='2013-07-01'

13/07/03 14:28:31 ERROR exec.Task: Failed with exception Unable to rename: 
hdfs://thv-nn1.pv.sv.nextag.com:8020/tmp/hive-nextag/hive_2013-07-03_14-25-39_324_3631745157890476605/-ext-1
 to: 
/user/beeswax/warehouse/impressions_hive_stats/outpdir_impressions_header/2013-07-01/record_counts
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename: 
hdfs://thv-nn1.pv.sv.nextag.com:8020/tmp/hive-nextag/hive_2013-07-03_14-25-39_324_3631745157890476605/-ext-1
 to: 
/user/beeswax/warehouse/impressions_hive_stats/outpdir_impressions_header/2013-07-01/record_counts
at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:95)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:159)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:695)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask



THIS SUCCEEDS
=
INSERT OVERWRITE DIRECTORY 
'/user/beeswax/warehouse/impressions_hive_stats/outpdir_impressions_header/record_counts/2013-07-01'
 select 'outpdir_impressions_header', '2013-07-01', 'record_counts', 
'all_servers', count(*) from outpdir_impressions_header where 
header_date_partition='2013-07-01'

TABLE DEFINITION

CREATE EXTERNAL TABLE  IF NOT EXISTS impressions_hive_stats(table_name STRING, 
aggregation_date STRING , metric_name STRING, metric_key STRING, metric_value 
BIGINT) PARTITIONED BY (table_name_partition STRING, aggregation_date_partition 
STRING , metric_name_partition STRING, metric_key_partition STRING)  STORED AS 
INPUTFORMAT  \"com.hadoop.mapred.DeprecatedLzoTextInputFormat\"   OUTPUTFORMAT 
\"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat\" ;

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: Issue with Oracle Hive Metastore (SEQUENCE_TABLE)

2013-07-03 Thread Darren Yin
you can get the hive 0.9 oracle script here:
https://github.com/apache/hive/blob/trunk/metastore/scripts/upgrade/oracle/hive-schema-0.9.0.oracle.sql


On Wed, Jul 3, 2013 at 1:22 PM, Raj Hadoop  wrote:

> Hi,
>
> When I installed Hive earlier on my machine I used a oracle hive meta
> script. Please find attached the script. HIVE worked fine for me on this
> box with no issues.
>
> I am trying to install Hive on another machine in a different Oracle
> metastore. I executed the meta script but I am having issues with my hive
> on second box.
>
> *$ hive
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
> files.
> Logging initialized using configuration in
> jar:file:/software/hadoop/hive/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties
> Hive history
> file=/tmp/hadoop/hive_job_log_hadoop_201307031616_605717324.txt
> hive> show tables;
> FAILED: Error in metadata: javax.jdo.JDOException: Couldnt obtain a new
> sequence (unique id) : ORA-00942: table or view does not exist*
> *NestedThrowables:
> java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist*
> *FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask*
> I found the difference between the two meta stores and one table is
> missing in it. The table is SEQUENCE_TABLE. I do not know whether this
> table will be created automatically by Hive or should it be in the script.I
> dont remember what I did earlier and I am assuming I used the same script.
> Can any one had this issue earlier ? Please advise.
>
> Also, Where to get the hive 0.9 oracle meta script?
>
> Thanks,
> Raj
>


Issue with Oracle Hive Metastore (SEQUENCE_TABLE)

2013-07-03 Thread Raj Hadoop
Hi,
 
When I installed Hive earlier on my machine I used a oracle hive meta script. 
Please find attached the script. HIVE worked fine for me on this box with no 
issues.
 
I am trying to install Hive on another machine in a different Oracle metastore. 
I executed the meta script but I am having issues with my hive on second box.
 
$ hive
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Logging initialized using configuration in 
jar:file:/software/hadoop/hive/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201307031616_605717324.txt
hive> show tables;
FAILED: Error in metadata: javax.jdo.JDOException: Couldnt obtain a new 
sequence (unique id) : ORA-00942: table or view does not exist
NestedThrowables:
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

I found the difference between the two meta stores and one table is missing in 
it. The table is SEQUENCE_TABLE. I do not know whether this table will be 
created automatically by Hive or should it be in the script.I dont remember 
what I did earlier and I am assuming I used the same script. Can any one had 
this issue earlier ? Please advise.
 
Also, Where to get the hive 0.9 oracle meta script?
 
Thanks,
Raj

hive-schema-0.9.0.oracle.sql
Description: Binary data


Re: Dealing with differents date format

2013-07-03 Thread Stephen Sprague
well. a couple of comments.

1.  you didn't have to change the your hive variable to a date. in your
case year = flocr(/1) and month=cast( % 100 as int)  just as i
mentioned in my first reply. :)  But given you did maybe that'll make
things easier for you down the road.

2. the 'into' construct in Oracle i believe is a server side variable - in
this case a scalar.  Hive does not have those so you're going to have to
refactor - not just translate - from PL/SQL to HiveQL.   Off the top of my
head - and people might cringe at this - i would investigate the
possibility of storing that min() value in a shell variable and then
reference that shell variable in another query.  eg. var=$(hive -e 'select
min(dt_jour)  from ...')  and then hive -e "your_next_query where
dt_jour=$var" . like i said though its kinda hacky so unless you can come
up with a server-side solution you might have to hold your nose and try it.


On Wed, Jul 3, 2013 at 2:26 AM, Nitin Pawar  wrote:

> instead of into we have as in hive
>
> so your query will be select min(dt_jour) as d_debut_semaine from table
> where col = value
> also remember this as is valid only till the query is being executed, it
> wont be preserved once query execution is over
>
>
> On Wed, Jul 3, 2013 at 2:30 PM, Jérôme Verdier  > wrote:
>
>> Hi,
>>
>> Thanks for your help.
>>
>> I resolve the problem by changing my variable in_co_an_mois into a normal
>> date format, and extract month and year by using apporopriate functions :
>> year() and month().
>>
>> But, i  have a new question :
>>
>> the PL/SQL script i have to translate in hive is written like this :
>>
>> SELECT min(dt_jour)
>> INTO D_debut_semaine
>> FROM ods.calendrier
>> WHERE co_an_semaine = in_co_an_sem;
>>
>> I have to record a value in a variable (here : D_debut_semaine) to use
>> this later.
>>
>> Is there a way to do this in Hive ?
>>
>>
>>
>> 2013/7/3 Paul COURTOIS 
>>
>>> Hi jerome,
>>>
>>>
>>>
>>> What about the from_unixtime and unix_timestamp  Udf ?
>>>
>>>
>>>
>>>
>>>
>>> from_unixtime() which accept bigint
>>>
>>>
>>>
>>> my 2 cents
>>>
>>>
>>>
>>> Paul
>>>
>>>
>>>
>>> *De :* Nitin Pawar [mailto:nitinpawar...@gmail.com]
>>> *Envoyé :* mercredi 3 juillet 2013 09:29
>>> *À :* user@hive.apache.org
>>> *Objet :* Re: Dealing with differents date format
>>>
>>>
>>>
>>> easiest way in this kind would be write up a small udf.
>>>
>>> As Stephen suggested, its just a number so you can do maths to extract
>>> year and month out of the number and then do the comparison.
>>>
>>>
>>>
>>> also 201307 is not a supported date format anywhere as per my knowledge
>>>
>>>
>>>
>>> On Wed, Jul 3, 2013 at 12:55 PM, Jérôme Verdier <
>>> verdier.jerom...@gmail.com> wrote:
>>>
>>> Hi Stephen,
>>>
>>> Thanks for your reply.
>>>
>>>
>>>
>>> The problem is that my input date is this : in_co_an_mois (format :
>>> MM, integer), for example, this month, we have 201307
>>>
>>> and i have to deal with this date : add one month, compare to over date,
>>> etc...
>>>
>>> The problem is that apparently, there is no way to do this, because Hive
>>> can't deal with this type of data because it's not a date format.
>>>
>>> For hive, this is just a number.
>>>
>>> Hive can deal with this : 1970-01-01 00:00:00, or this : 2009-03-20, but
>>> not with this unusual format : 201307.
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2013/7/2 Stephen Sprague 
>>>
>>> not sure i fully understand your dilemma.have you investigated any
>>> of the date functions listed here?
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
>>>
>>> seems to me you could pull the year and month from a date.  or if you
>>> have an int then do some arithmetic to get the year and month.  eg. year =
>>> floor( /1) and month = cast(  % 100 as int)  [% ==
>>> modulus operator]
>>>
>>> or am i not even answering your question?
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jul 2, 2013 at 2:42 AM, Jérôme Verdier <
>>> verdier.jerom...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> i trying to translate some PL/SQL script in HiveQL, and dealing with
>>> unusual date format.
>>>
>>> i added a variable in my hive script : '${hiveconf:in_co_an_mois}' which
>>> is a year/month date format, like this : 201307 (INT format).
>>>
>>> I would like to transform this in date format, because i have to
>>> increment this (add one month/one year).
>>>
>>> Is there a way to do this in hive ?
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>> --
>>> *Jérôme*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>>
>> --
>> *Jérôme VERDIER*
>> 06.72.19.17.31
>> verdier.jerom...@gmail.com
>>
>>
>
>
> --
> Nitin Pawar
>


回复: different outer join plan between hive 0.9 and hive 0.10

2013-07-03 Thread wzc1989
Hi navis:  
Thanks for your reply. Currently I'm working on the  temporary solution by 
changing the type of filter mask and doing the performance test. I try to read 
the patches and source code now and when I get better understanding of the code 
maybe I can help with this problem :)

--  
wzc1989
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)


在 2013年7月2日星期二,上午8:24,Navis류승우 写道:

> Yes, a little bit.
>  
> IMHO, these flags could be assigned only for aliases with condition on
> 'on' clause. Then, I think, even a byte (8 flags) could be enough in
> most cases.
>  
> I'll do that if time permits.
>  
> 2013/7/1 wzc1989 mailto:wzc1...@gmail.com)>:
> > hi navis:
> > look at the patches in (HIVE-3411, HIVE-4206, HIVE-4212, HIVE-3464), I
> > understand what you mean by "hive tags rows a filter mask as a short for
> > outer join, which can contain 16 flags. " . I wonder why not choose Long or
> > int which can contain 64/32 tags. Does adding one Long/int in every row cost
> > too much?
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年5月14日星期二,下午2:17,Navis류승우 写道:
> >  
> > In short, hive tags rows a filter mask as a short for outer join,
> > which can contain 16 flags. (see HIVE-3411, plz)
> >  
> > I'll survey for a solution.
> >  
> > 2013/5/14 wzc1989 mailto:wzc1...@gmail.com)>:
> >  
> > "hive cannot merge joins of 16+ aliases with outer join into single stage."
> > In our use case we use one table full outer join all other table to produce
> > one big table, which may exceed 16 outer join limits and will be split into
> > multi stage under hive 0.10.
> > It become very slow under hive 0.10 while we run such query well under hive
> > 0.9.
> > I believe it's due to the diff of query plan. I wonder why hive 0.10 cannot
> > merge join 16+ aliases into single stage while hive 0.9 doesn't have such
> > issue. could you explain this or give me some hint?
> >  
> > Thanks!
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年5月14日星期二,下午12:26,Navis류승우 写道:
> >  
> > The error message means hive cannot merge joins of 16+ aliases with
> > outer join into single stage. It was 8 way originally (HIVE-3411) but
> > expanded to 16 later.
> >  
> > Check https://issues.apache.org/jira/browse/HIVE-3411 for details.
> >  
> > 2013/5/14 wzc1989 mailto:wzc1...@gmail.com)>:
> >  
> > This time i cherry-pick HIVE-3464, HIVE-4212, HIVE-4206 and some related
> > commits and the above explain result matches in hive 0.9 and hive 0.10,
> > thanks!
> > But I confuse about this error msg:
> >  
> > JOINNODE_OUTERJOIN_MORETHAN_16(10142, "Single join node containing outer
> > join(s) " +
> > "cannot have more than 16 aliases"),
> >  
> > does this mean in hive0.10 when we have more than 16 outer join the query
> > plan will still have some bug?
> > I test the sql below and find the explain result still diff between hive 0.9
> > and hive 0.10.
> >  
> > explain select
> > sum(a.value) val
> > from default.test_join a
> > left outer join default.test_join b on a.key = b.key
> > left outer join default.test_join c on a.key = c.key
> > left outer join default.test_join d on a.key = d.key
> > left outer join default.test_join e on a.key = e.key
> > left outer join default.test_join f on a.key = f.key
> > left outer join default.test_join g on a.key = g.key
> > left outer join default.test_join h on a.key = h.key
> > left outer join default.test_join i on a.key = i.key
> > left outer join default.test_join j on a.key = j.key
> > left outer join default.test_join k on a.key = k.key
> > left outer join default.test_join l on a.key = l.key
> > left outer join default.test_join m on a.key = m.key
> > left outer join default.test_join n on a.key = n.key
> > left outer join default.test_join u on a.key = u.key
> > left outer join default.test_join v on a.key = v.key
> > left outer join default.test_join w on a.key = w.key
> > left outer join default.test_join x on a.key = x.key
> > left outer join default.test_join z on a.key = z.key
> >  
> >  
> > --
> > wzc1989
> > 已使用 Sparrow
> >  
> > 在 2013年3月29日星期五,上午9:34,Navis류승우 写道:
> >  
> > The problem is mixture of issues (HIVE-3411, HIVE-4209, HIVE-4212,
> > HIVE-3464) and still not completely fixed even in trunk.
> >  
> > Will be fixed shortly.
> >  
> > 2013/3/29 wzc mailto:wzc1...@gmail.com)>:
> >  
> > The bug remains even if I apply the patch in HIVE-4206 :( The explain
> > result hasn't change.
> >  
> >  
> > 2013/3/28 Navis류승우 mailto:navis@nexr.com)>
> >  
> >  
> > It's a bug (https://issues.apache.org/jira/browse/HIVE-4206).
> >  
> > Thanks for reporting it.
> >  
> > 2013/3/24 wzc mailto:wzc1...@gmail.com)>:
> >  
> > Recently we tried to upgrade our hive from 0.9 to 0.10, but found some
> > of
> > our hive queries almost 7 times slow. One of such query consists
> > multiple
> > table outer join on the same key. By looking into the query, we found
> > the
> > query plans generate by hive 0.9 and hive 0.10 are different. Here is
> > the
> > example:
> >  
> > testcase:
> >

Re: Partition performance

2013-07-03 Thread Owen O'Malley
On Wed, Jul 3, 2013 at 5:19 AM, David Morel  wrote:

>
> That is still not really answering the question, which is: why is it slower
> to run a query on a heavily partitioned table than it is on the same number
> of files in a less heavily partitioned table.
>

According to Gopal's investigations in
https://issues.apache.org/jira/browse/HIVE-4051, each time Hive plans a
query, it does a query per a partition to the backing SQL database. That
would explain a lot of the latency for tables with large numbers of
partitions.

-- Owen


Re: Partition performance

2013-07-03 Thread Edward Capriolo
1) each partition object is a row in the metastore usually mysql, querying
large tables with many partitions has longer startup time as the hive query
planner has to fetch and process all of this meta-information. This is not
a distributed process. It is usually fast within a few seconds but for very
large partitions it can be slow.

2) hadoop's small files problem. <- google that. Small files end up being
much more overhead for a given map reduce job, generally the more
files/partitions the more map/reduce tasks. More map reduce tasks is more
overhead, more overhead is less throughput.

::SHAMELESS PLUG:: We discuss this in detail the book programming hive, in
the schema design section



On Wed, Jul 3, 2013 at 8:19 AM, David Morel  wrote:

> On 2 Jul 2013, at 16:51, Owen O'Malley wrote:
>
> > On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron <
> > peter.mar...@trilliumsoftware.com> wrote:
> >
> >> Hi Owen,
> >>
> >> ** **
> >>
> >> I’m curious about this advice about partitioning. Is there some
> >> fundamental reason why Hive
> >>
> >> is slow when the number of partitions is 10,000 rather than 1,000?
> >>
> >
> > The precise numbers don't matter. I wanted to give people a ballpark
> range
> > that they should be looking at. Most tables at 1000 partitions won't
> cause
> > big slow downs, but the cost scales with the number of partitions. By the
> > time you are at 10,000 the cost is noticeable. I have one customer who
> has
> > a table with 1.2 million partitions. That causes a lot of slow downs.
>
> That is still not really answering the question, which is: why is it slower
> to run a query on a heavily partitioned table than it is on the same number
> of files in a less heavily partitioned table.
>
> David
>


Re: Partition performance

2013-07-03 Thread Dean Wampler
How big were the files in each case in your experiment? Having lots of
small files will add Hadoop overhead.

Also, it would be useful to know the execution times of the map and reduce
tasks. The rule of thumb is that under 20 seconds each, or so, you're
paying a significant of the execution time in startup and shutdown overhead.

Of course, another factor is the number of tasks your cluster can run in
parallel. Scanning 20K partitions with a 1K MapReduce slot capacity over
the cluster will obviously take ~20 passes vs. ~1 pass for 1K partitions.

dean

On Tue, Jul 2, 2013 at 4:34 AM, Peter Marron <
peter.mar...@trilliumsoftware.com> wrote:

>  ...
>
> ** **
>
> *From: *Ian 
> *Reply-To: *"user@hive.apache.org" , Ian <
> liu...@yahoo.com>
> *Date: *Thursday, April 4, 2013 4:01 PM
> *To: *"user@hive.apache.org" 
> *Subject: *Partition performance
>
> ** **
>
> Hi,
>
>  
>
> I created 3 years of hourly log files (totally 26280 files), and use
> External Table with partition to query. I tried two partition methods.
>
>  
>
> 1). Log files are stored as /test1/2013/04/02/16/00_0 (A directory per
> hour). Use date and hour as partition keys. Add 3 years of directories to
> the table partitions. So there are 26280 partitions.
>
> CREATE EXTERNAL TABLE test1 (logline string) PARTITIONED BY (dt
> string, hr int);
>
> ALTER TABLE test1 ADD PARTITION (dt='2013-04-02', hr=16) LOCATION
> '/test1/2013/04/02/16';
>
>  
>
> 2). Log files are stored as /test2/2013/04/02/16_00_0 (A directory per
> day, 24 files in each directory). Use date as partition key. Add 3 years of
> directories to the table partitions. So there are 1095 partitions.
>
> CREATE EXTERNAL TABLE test2 (logline string) PARTITIONED BY (dt
> string);
>
> ALTER TABLE test2 ADD PARTITION (dt='2013-04-02') LOCATION
> '/test2/2013/04/02';
>
>  
>
> When doing a simple query like 
>
> SELECT * FROM  test1/test2  WHERE  dt >= '2013-02-01' and dt <=
> '2013-02-14'
>
> Using approach #1 takes 320 seconds, but #2 only takes 70 seconds. 
>
>  
>
> I'm wondering why there is a big performance difference between these two?
> These two approaches have the same number of files, only the directory
> structure is different. So Hive is going to load the same amount of files.
> Why does the number of partitions have such big impact? Does that mean #2
> is a better partition strategy?
>
>  
>
> Thanks.
>
>  
>
>  
>
> ** **
>
> CONFIDENTIALITY NOTICE
> ==
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.*
> ***
>
> ** **
>
> ** **
>



-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com


Re: Moving hive from one server to another

2013-07-03 Thread Bennie Schut

Unfortunately the ip is stored with each partition in the metadatabase.
I once did an update on the metatdata for our server to replace all old 
ip's with new ip's. It's not pretty but that actually works.


Op 28-6-2013 06:29, Manickam P schreef:

Hi,

What are the steps one should follow to move hive from one server to 
another along with hadoop?
I've moved my hadoop master node from one server to another and then 
moved my hive also. I started all my hadoop nodes successfully but 
getting error while executing hive queries. It shows the below error 
and shows my old master node ip address.


*java.net.ConnectException: Call to
192.168.99.33/192.168.99.33:5 failed on connection exception:
java.net.ConnectException: Connection refused*
*
Job Submission failed with exception
'java.net.ConnectException(Call to
192.000.00.33/192.000.00.33:5 failed on connection exception:
java.net.ConnectException: Connection refused)'
java.lang.IllegalArgumentException: Can not create a Path from an
empty string

*

I checked my hive-site.xml file i have given the correct new ip 
address. Anyone pls tell me where would be the mistake here. 
I don't have any clue.



Thanks,
Manickam P




Re: Partition performance

2013-07-03 Thread David Morel
On 2 Jul 2013, at 16:51, Owen O'Malley wrote:

> On Tue, Jul 2, 2013 at 2:34 AM, Peter Marron <
> peter.mar...@trilliumsoftware.com> wrote:
>
>> Hi Owen,
>>
>> ** **
>>
>> I’m curious about this advice about partitioning. Is there some
>> fundamental reason why Hive
>>
>> is slow when the number of partitions is 10,000 rather than 1,000?
>>
>
> The precise numbers don't matter. I wanted to give people a ballpark range
> that they should be looking at. Most tables at 1000 partitions won't cause
> big slow downs, but the cost scales with the number of partitions. By the
> time you are at 10,000 the cost is noticeable. I have one customer who has
> a table with 1.2 million partitions. That causes a lot of slow downs.

That is still not really answering the question, which is: why is it slower
to run a query on a heavily partitioned table than it is on the same number 
of files in a less heavily partitioned table.

David


Re: Fetching Results from Hive Select (JDBC ResultSet.next() vs HiveClient.fetchN())

2013-07-03 Thread Christian Schneider
Hi, i browsed through the sources and found a way to tune the JDBC
ResultSet.next() performance.

final Connection con =
DriverManager.getConnection("jdbc:hive2://carolin:1/default", "hive",
"");
final Statement stmt = con.createStatement();
final String tableName = "bigdata";

sql = "select * from " + tableName + " limit 15";
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);

// enlarge the FetchSize (default is just 50!)
((HiveQueryResultSet) res).setFetchSize(1);

Best Regards,
Christian.


2013/6/26 Christian Schneider 

> I just test the same statement with beeline and got the same bad
> performance.
>
> Any ideas?
>
> Best Regards,
> Chrisitan.
>
>
> 2013/6/26 Christian Schneider 
>
>> Hi,
>> currently we are using HiveSever1 with the native HiveClient interface.
>> Our application design looks horrible because (for whatever reason) it
>> spawns a dedicated HiveServer for every query.
>>
>> We thought it is a good idea to switch to HiveServer2 (because the
>> MetaStore get used by many different applications).
>>
>> The JDBC setup was straight forward, but the performance is not what we
>> assumed.
>>
>> If we fetch a large result set (with fetchN()  over HiveClient) we read
>> with around 10MB/s.
>>
>> If I use JDBC (with resultSet.next() ) i have a throughput from 1MB/*min*
>> .
>>
>> Any chance to speed this up (like bulk fetching)?
>>
>> Best Regards,
>> Christian.
>>
>
>


Re: Dealing with differents date format

2013-07-03 Thread Nitin Pawar
instead of into we have as in hive

so your query will be select min(dt_jour) as d_debut_semaine from table
where col = value
also remember this as is valid only till the query is being executed, it
wont be preserved once query execution is over


On Wed, Jul 3, 2013 at 2:30 PM, Jérôme Verdier
wrote:

> Hi,
>
> Thanks for your help.
>
> I resolve the problem by changing my variable in_co_an_mois into a normal
> date format, and extract month and year by using apporopriate functions :
> year() and month().
>
> But, i  have a new question :
>
> the PL/SQL script i have to translate in hive is written like this :
>
> SELECT min(dt_jour)
> INTO D_debut_semaine
> FROM ods.calendrier
> WHERE co_an_semaine = in_co_an_sem;
>
> I have to record a value in a variable (here : D_debut_semaine) to use
> this later.
>
> Is there a way to do this in Hive ?
>
>
>
> 2013/7/3 Paul COURTOIS 
>
>> Hi jerome,
>>
>>
>>
>> What about the from_unixtime and unix_timestamp  Udf ?
>>
>>
>>
>>
>>
>> from_unixtime() which accept bigint
>>
>>
>>
>> my 2 cents
>>
>>
>>
>> Paul
>>
>>
>>
>> *De :* Nitin Pawar [mailto:nitinpawar...@gmail.com]
>> *Envoyé :* mercredi 3 juillet 2013 09:29
>> *À :* user@hive.apache.org
>> *Objet :* Re: Dealing with differents date format
>>
>>
>>
>> easiest way in this kind would be write up a small udf.
>>
>> As Stephen suggested, its just a number so you can do maths to extract
>> year and month out of the number and then do the comparison.
>>
>>
>>
>> also 201307 is not a supported date format anywhere as per my knowledge
>>
>>
>>
>> On Wed, Jul 3, 2013 at 12:55 PM, Jérôme Verdier <
>> verdier.jerom...@gmail.com> wrote:
>>
>> Hi Stephen,
>>
>> Thanks for your reply.
>>
>>
>>
>> The problem is that my input date is this : in_co_an_mois (format :
>> MM, integer), for example, this month, we have 201307
>>
>> and i have to deal with this date : add one month, compare to over date,
>> etc...
>>
>> The problem is that apparently, there is no way to do this, because Hive
>> can't deal with this type of data because it's not a date format.
>>
>> For hive, this is just a number.
>>
>> Hive can deal with this : 1970-01-01 00:00:00, or this : 2009-03-20, but
>> not with this unusual format : 201307.
>>
>> Thanks.
>>
>>
>>
>>
>>
>>
>>
>> 2013/7/2 Stephen Sprague 
>>
>> not sure i fully understand your dilemma.have you investigated any of
>> the date functions listed here?
>>
>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
>>
>> seems to me you could pull the year and month from a date.  or if you
>> have an int then do some arithmetic to get the year and month.  eg. year =
>> floor( /1) and month = cast(  % 100 as int)  [% ==
>> modulus operator]
>>
>> or am i not even answering your question?
>>
>>
>>
>>
>>
>> On Tue, Jul 2, 2013 at 2:42 AM, Jérôme Verdier <
>> verdier.jerom...@gmail.com> wrote:
>>
>> Hi,
>>
>> i trying to translate some PL/SQL script in HiveQL, and dealing with
>> unusual date format.
>>
>> i added a variable in my hive script : '${hiveconf:in_co_an_mois}' which
>> is a year/month date format, like this : 201307 (INT format).
>>
>> I would like to transform this in date format, because i have to
>> increment this (add one month/one year).
>>
>> Is there a way to do this in hive ?
>>
>> Thanks.
>>
>>
>>
>>
>> --
>> *Jérôme*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *Jérôme VERDIER*
> 06.72.19.17.31
> verdier.jerom...@gmail.com
>
>


-- 
Nitin Pawar


Re: Dealing with differents date format

2013-07-03 Thread Jérôme Verdier
Hi,

Thanks for your help.

I resolve the problem by changing my variable in_co_an_mois into a normal
date format, and extract month and year by using apporopriate functions :
year() and month().

But, i  have a new question :

the PL/SQL script i have to translate in hive is written like this :

SELECT min(dt_jour)
INTO D_debut_semaine
FROM ods.calendrier
WHERE co_an_semaine = in_co_an_sem;

I have to record a value in a variable (here : D_debut_semaine) to use this
later.

Is there a way to do this in Hive ?



2013/7/3 Paul COURTOIS 

> Hi jerome,
>
>
>
> What about the from_unixtime and unix_timestamp  Udf ?
>
>
>
>
>
> from_unixtime() which accept bigint
>
>
>
> my 2 cents
>
>
>
> Paul
>
>
>
> *De :* Nitin Pawar [mailto:nitinpawar...@gmail.com]
> *Envoyé :* mercredi 3 juillet 2013 09:29
> *À :* user@hive.apache.org
> *Objet :* Re: Dealing with differents date format
>
>
>
> easiest way in this kind would be write up a small udf.
>
> As Stephen suggested, its just a number so you can do maths to extract
> year and month out of the number and then do the comparison.
>
>
>
> also 201307 is not a supported date format anywhere as per my knowledge
>
>
>
> On Wed, Jul 3, 2013 at 12:55 PM, Jérôme Verdier <
> verdier.jerom...@gmail.com> wrote:
>
> Hi Stephen,
>
> Thanks for your reply.
>
>
>
> The problem is that my input date is this : in_co_an_mois (format :
> MM, integer), for example, this month, we have 201307
>
> and i have to deal with this date : add one month, compare to over date,
> etc...
>
> The problem is that apparently, there is no way to do this, because Hive
> can't deal with this type of data because it's not a date format.
>
> For hive, this is just a number.
>
> Hive can deal with this : 1970-01-01 00:00:00, or this : 2009-03-20, but
> not with this unusual format : 201307.
>
> Thanks.
>
>
>
>
>
>
>
> 2013/7/2 Stephen Sprague 
>
> not sure i fully understand your dilemma.have you investigated any of
> the date functions listed here?
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
>
> seems to me you could pull the year and month from a date.  or if you have
> an int then do some arithmetic to get the year and month.  eg. year =
> floor( /1) and month = cast(  % 100 as int)  [% ==
> modulus operator]
>
> or am i not even answering your question?
>
>
>
>
>
> On Tue, Jul 2, 2013 at 2:42 AM, Jérôme Verdier 
> wrote:
>
> Hi,
>
> i trying to translate some PL/SQL script in HiveQL, and dealing with
> unusual date format.
>
> i added a variable in my hive script : '${hiveconf:in_co_an_mois}' which
> is a year/month date format, like this : 201307 (INT format).
>
> I would like to transform this in date format, because i have to increment
> this (add one month/one year).
>
> Is there a way to do this in hive ?
>
> Thanks.
>
>
>
>
> --
> *Jérôme*
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Nitin Pawar
>



-- 
*Jérôme VERDIER*
06.72.19.17.31
verdier.jerom...@gmail.com


RE: Dealing with differents date format

2013-07-03 Thread Paul COURTOIS
Hi jerome,



What about the from_unixtime and unix_timestamp  Udf ?





from_unixtime() which accept bigint



my 2 cents



Paul



*De :* Nitin Pawar [mailto:nitinpawar...@gmail.com]
*Envoyé :* mercredi 3 juillet 2013 09:29
*À :* user@hive.apache.org
*Objet :* Re: Dealing with differents date format



easiest way in this kind would be write up a small udf.

As Stephen suggested, its just a number so you can do maths to extract year
and month out of the number and then do the comparison.



also 201307 is not a supported date format anywhere as per my knowledge



On Wed, Jul 3, 2013 at 12:55 PM, Jérôme Verdier 
wrote:

Hi Stephen,

Thanks for your reply.



The problem is that my input date is this : in_co_an_mois (format : MM,
integer), for example, this month, we have 201307

and i have to deal with this date : add one month, compare to over date,
etc...

The problem is that apparently, there is no way to do this, because Hive
can't deal with this type of data because it's not a date format.

For hive, this is just a number.

Hive can deal with this : 1970-01-01 00:00:00, or this : 2009-03-20, but
not with this unusual format : 201307.

Thanks.







2013/7/2 Stephen Sprague 

not sure i fully understand your dilemma.have you investigated any of
the date functions listed here?

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

seems to me you could pull the year and month from a date.  or if you have
an int then do some arithmetic to get the year and month.  eg. year =
floor( /1) and month = cast(  % 100 as int)  [% ==
modulus operator]

or am i not even answering your question?





On Tue, Jul 2, 2013 at 2:42 AM, Jérôme Verdier 
wrote:

Hi,

i trying to translate some PL/SQL script in HiveQL, and dealing with
unusual date format.

i added a variable in my hive script : '${hiveconf:in_co_an_mois}' which is
a year/month date format, like this : 201307 (INT format).

I would like to transform this in date format, because i have to increment
this (add one month/one year).

Is there a way to do this in hive ?

Thanks.




-- 
*Jérôme*













-- 
Nitin Pawar


Re: Dealing with differents date format

2013-07-03 Thread Nitin Pawar
easiest way in this kind would be write up a small udf.
As Stephen suggested, its just a number so you can do maths to extract year
and month out of the number and then do the comparison.

also 201307 is not a supported date format anywhere as per my knowledge


On Wed, Jul 3, 2013 at 12:55 PM, Jérôme Verdier
wrote:

> Hi Stephen,
>
> Thanks for your reply.
>
> The problem is that my input date is this : in_co_an_mois (format :
> MM, integer), for example, this month, we have 201307
>
> and i have to deal with this date : add one month, compare to over date,
> etc...
>
> The problem is that apparently, there is no way to do this, because Hive
> can't deal with this type of data because it's not a date format.
>
> For hive, this is just a number.
>
> Hive can deal with this : 1970-01-01 00:00:00, or this : 2009-03-20, but
> not with this unusual format : 201307.
>
> Thanks.
>
>
>
>
> 2013/7/2 Stephen Sprague 
>
>> not sure i fully understand your dilemma.have you investigated any of
>> the date functions listed here?
>>
>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
>>
>> seems to me you could pull the year and month from a date.  or if you
>> have an int then do some arithmetic to get the year and month.  eg. year =
>> floor( /1) and month = cast(  % 100 as int)  [% ==
>> modulus operator]
>>
>> or am i not even answering your question?
>>
>>
>>
>> On Tue, Jul 2, 2013 at 2:42 AM, Jérôme Verdier <
>> verdier.jerom...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> i trying to translate some PL/SQL script in HiveQL, and dealing with
>>> unusual date format.
>>>
>>> i added a variable in my hive script : '${hiveconf:in_co_an_mois}' which
>>> is a year/month date format, like this : 201307 (INT format).
>>>
>>> I would like to transform this in date format, because i have to
>>> increment this (add one month/one year).
>>>
>>> Is there a way to do this in hive ?
>>>
>>> Thanks.
>>>
>>>
>>>
>>> --
>>> *Jérôme*
>>>
>>>
>>
>
>


-- 
Nitin Pawar


Re: Dealing with differents date format

2013-07-03 Thread Jérôme Verdier
Hi Stephen,

Thanks for your reply.

The problem is that my input date is this : in_co_an_mois (format : MM,
integer), for example, this month, we have 201307

and i have to deal with this date : add one month, compare to over date,
etc...

The problem is that apparently, there is no way to do this, because Hive
can't deal with this type of data because it's not a date format.

For hive, this is just a number.

Hive can deal with this : 1970-01-01 00:00:00, or this : 2009-03-20, but
not with this unusual format : 201307.

Thanks.




2013/7/2 Stephen Sprague 

> not sure i fully understand your dilemma.have you investigated any of
> the date functions listed here?
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
>
> seems to me you could pull the year and month from a date.  or if you have
> an int then do some arithmetic to get the year and month.  eg. year =
> floor( /1) and month = cast(  % 100 as int)  [% ==
> modulus operator]
>
> or am i not even answering your question?
>
>
>
> On Tue, Jul 2, 2013 at 2:42 AM, Jérôme Verdier  > wrote:
>
>> Hi,
>>
>> i trying to translate some PL/SQL script in HiveQL, and dealing with
>> unusual date format.
>>
>> i added a variable in my hive script : '${hiveconf:in_co_an_mois}' which
>> is a year/month date format, like this : 201307 (INT format).
>>
>> I would like to transform this in date format, because i have to
>> increment this (add one month/one year).
>>
>> Is there a way to do this in hive ?
>>
>> Thanks.
>>
>>
>>
>> --
>> *Jérôme*
>>
>>
>