Cannot insert into a bucketized table from the same table

2013-08-29 Thread Neha Sood
Hive version: 0.10.0-cdh4.2.1

Trying to insert into a bucketized table from the same table, throws
exception:
FAILED: SemanticException [Error 10122]: Bucketized tables do not support
INSERT INTO: Table:

To test the scenario, I create the following 3 test tables:
create table temp1
(a int)
PARTITIONED BY (
  b string,
  c string)
  clustered by (a) into 2 buckets
  stored as RCFILE;

create table temp2
(a int)
  stored as RCFILE;

create table temp3
(a int)
PARTITIONED BY (
  b string,
  c string)
  stored as RCFILE;

When I run the below insert queries, queries 1, 3  4 work fine, while
query 2 fails with the above mentioned exception.
1. insert into table temp1 partition(b='1', c='1') select * from temp2;

2. insert into table temp1 partition(b='1', c='1') select * from temp2 t2
join temp1 t1 on(t2.a=t1.a);

3. insert into table temp1 partition(b='1', c='1') select t3.a from (select
t1.a from temp2 t2 join temp1 t1 on(t2.a=t1.a)) tt join temp3 t3 on
(tt.a=t3.a);

4. insert into table temp3 partition(b='1', c='1') select t2.a from temp2
t2 join temp3 t1 on(t2.a=t1.a);

So, the above exception occurs only if the target table is bucketed and it
uses itself directly in the join tables. For some reason, case 3 works
fine, where its joining with
some other table finally.

Has anyone faced this issue earlier? Looks like a Hive bug. Is there any
workaround?


Re: Cannot insert into a bucketized table from the same table

2013-08-29 Thread Nitin Pawar
Neha, its not a bug. Hive does not support insert into bucketed table as of
now.

There is a patch available for same at
HIVE-3244https://issues.apache.org/jira/browse/HIVE-3244
You can also refer HIVE-3077https://issues.apache.org/jira/browse/HIVE-3077


On Thu, Aug 29, 2013 at 2:32 PM, Neha Sood neha.sood.agar...@gmail.comwrote:

 Hive version: 0.10.0-cdh4.2.1

 Trying to insert into a bucketized table from the same table, throws
 exception:
 FAILED: SemanticException [Error 10122]: Bucketized tables do not support
 INSERT INTO: Table:

 To test the scenario, I create the following 3 test tables:
 create table temp1
 (a int)
 PARTITIONED BY (
   b string,
   c string)
   clustered by (a) into 2 buckets
   stored as RCFILE;

 create table temp2
 (a int)
   stored as RCFILE;

 create table temp3
 (a int)
 PARTITIONED BY (
   b string,
   c string)
   stored as RCFILE;

 When I run the below insert queries, queries 1, 3  4 work fine, while
 query 2 fails with the above mentioned exception.
 1. insert into table temp1 partition(b='1', c='1') select * from temp2;

 2. insert into table temp1 partition(b='1', c='1') select * from temp2 t2
 join temp1 t1 on(t2.a=t1.a);

 3. insert into table temp1 partition(b='1', c='1') select t3.a from
 (select t1.a from temp2 t2 join temp1 t1 on(t2.a=t1.a)) tt join temp3 t3 on
 (tt.a=t3.a);

 4. insert into table temp3 partition(b='1', c='1') select t2.a from temp2
 t2 join temp3 t1 on(t2.a=t1.a);

 So, the above exception occurs only if the target table is bucketed and it
 uses itself directly in the join tables. For some reason, case 3 works
 fine, where its joining with
 some other table finally.

 Has anyone faced this issue earlier? Looks like a Hive bug. Is there any
 workaround?




-- 
Nitin Pawar


Problème with min function in HiveQL

2013-08-29 Thread Jérôme Verdier
Hi everybody,

I am coding some HiveQL script to do some calculations.

I have a problem with the min() function.

My hive script is below :

INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM

SELECT
'${hiveconf:in_co_societe}'   as co_societe,
'${hiveconf:in_co_an_semaine}'as co_an_semaine,
a.type_entite as type_entite,
a.code_entite as code_entite,
a.type_rgrp_produits  as type_rgrp_produits,
a.co_rgrp_produitsas co_rgrp_produits,
SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
SUM(a.NB_CLIENTS) as NB_CLIENTS,
SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP,
SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
  from
default.THM_CA_RGRP_PRODUITS_JOUR a
  JOIN default.CALENDRIER b
  -- A verifier
  WHERE CAST(a.dt_jour AS TIMESTAMP) =
CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
  AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS TIMESTAMP)
  AND a.co_societe = '${hiveconf:in_co_societe}'
  AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
  GROUP BY
a.type_entite,
a.code_entite,
a.type_rgrp_produits,
a.co_rgrp_produits;

And, when i try to launch this, i get this error :

FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported place
for UDAF 'min'

Obviously, there is a problem with min() function.

How can i pass through this error?

Thanks for your help


Re: Problème with min function in HiveQL

2013-08-29 Thread Stephen Sprague
the min function at column 62 is on on the column b.dt_jour.  what datatype
is that?

if its of type 'timestamp' that might explain it.


On Thu, Aug 29, 2013 at 3:01 AM, Jérôme Verdier
verdier.jerom...@gmail.comwrote:

 Hi everybody,

 I am coding some HiveQL script to do some calculations.

 I have a problem with the min() function.

 My hive script is below :

 INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM

 SELECT
 '${hiveconf:in_co_societe}'   as co_societe,
 '${hiveconf:in_co_an_semaine}'as co_an_semaine,
 a.type_entite as type_entite,
 a.code_entite as code_entite,
 a.type_rgrp_produits  as type_rgrp_produits,
 a.co_rgrp_produitsas co_rgrp_produits,
 SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
 SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
 SUM(a.NB_CLIENTS) as NB_CLIENTS,
 SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
 SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
 SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
   from
 default.THM_CA_RGRP_PRODUITS_JOUR a
   JOIN default.CALENDRIER b
   -- A verifier
   WHERE CAST(a.dt_jour AS TIMESTAMP) =
 CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
   AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS
 TIMESTAMP)
   AND a.co_societe = '${hiveconf:in_co_societe}'
   AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
   GROUP BY
 a.type_entite,
 a.code_entite,
 a.type_rgrp_produits,
 a.co_rgrp_produits;

 And, when i try to launch this, i get this error :

 FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported
 place for UDAF 'min'

 Obviously, there is a problem with min() function.

 How can i pass through this error?

 Thanks for your help




Re: Problème with min function in HiveQL

2013-08-29 Thread Jérôme Verdier
Hi stephen,

Thanks for your reply.

Effectively, dt_jour is timestamp format.

What is the problem with this?

-Original Message-
From: Stephen Sprague sprag...@gmail.com
Date: Thu, 29 Aug 2013 09:24:27 
To: user@hive.apache.orguser@hive.apache.org
Reply-To: user@hive.apache.org
Subject: Re: Problème with min function in HiveQL

the min function at column 62 is on on the column b.dt_jour.  what datatype
is that?

if its of type 'timestamp' that might explain it.


On Thu, Aug 29, 2013 at 3:01 AM, Jérôme Verdier
verdier.jerom...@gmail.comwrote:

 Hi everybody,

 I am coding some HiveQL script to do some calculations.

 I have a problem with the min() function.

 My hive script is below :

 INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM

 SELECT
 '${hiveconf:in_co_societe}'   as co_societe,
 '${hiveconf:in_co_an_semaine}'as co_an_semaine,
 a.type_entite as type_entite,
 a.code_entite as code_entite,
 a.type_rgrp_produits  as type_rgrp_produits,
 a.co_rgrp_produitsas co_rgrp_produits,
 SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
 SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
 SUM(a.NB_CLIENTS) as NB_CLIENTS,
 SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
 SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
 SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
   from
 default.THM_CA_RGRP_PRODUITS_JOUR a
   JOIN default.CALENDRIER b
   -- A verifier
   WHERE CAST(a.dt_jour AS TIMESTAMP) =
 CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
   AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS
 TIMESTAMP)
   AND a.co_societe = '${hiveconf:in_co_societe}'
   AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
   GROUP BY
 a.type_entite,
 a.code_entite,
 a.type_rgrp_produits,
 a.co_rgrp_produits;

 And, when i try to launch this, i get this error :

 FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported
 place for UDAF 'min'

 Obviously, there is a problem with min() function.

 How can i pass through this error?

 Thanks for your help





Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-08-29 Thread agateaaa
Hi All:

Put some debugging code in TUGIContainingTransport.getTransport() and I
tracked it down to

@Override
public TUGIContainingTransport getTransport(TTransport trans) {

// UGI information is not available at connection setup time, it will be
set later
// via set_ugi() rpc.
transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

//return transMap.get(trans); -change
  TUGIContainingTransport retTrans = transMap.get(trans);

  if ( retTrans == null ) {



}





On Wed, Jul 31, 2013 at 9:48 AM, agateaaa agate...@gmail.com wrote:

 Thanks Nitin

 There arent too many connections in close_wait state only 1 or two when we
 run into this. Most likely its because of dropped connection.

 I could not find any read or write timeouts we can set for the thrift
 server which will tell thrift to hold on to the client connection.
  See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem
 to have been implemented yet. We do have set a client connection timeout
 but cannot find
 an equivalent setting for the server.

 We have  a suspicion that this happens when we run two client processes
 which modify two distinct partitions of the same hive table. We put in a
 workaround so that the two hive client processes never run together and so
 far things look ok but we will keep monitoring.

 Could it be because hive metastore server is not thread safe, would
 running two alter table statements on two distinct partitions of the same
 table using two client connections cause problems like these, where hive
 metastore server closes or drops a wrong client connection and leaves the
 other hanging?

 Agateaaa




 On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 The mentioned flow is called when you have unsecure mode of thrift
 metastore client-server connection. So one way to avoid this is have a
 secure way.

 code
 public boolean process(final TProtocol in, final TProtocol out)
 throwsTException {
 setIpAddress(in);
 ...
 ...
 ...
 @Override
  protected void setIpAddress(final TProtocol in) {
 TUGIContainingTransport ugiTrans =
 (TUGIContainingTransport)in.getTransport();
 Socket socket = ugiTrans.getSocket();
 if (socket != null) {
   setIpAddress(socket);

 /code


 From the above code snippet, it looks like the null pointer exception is
 not handled if the getSocket returns null.

 can you check whats the ulimit setting on the server? If its set to
 default
 can you set it to unlimited and restart hcat server. (This is just a wild
 guess).

 also the getSocket method suggests If the underlying TTransport is an
 instance of TSocket, it returns the Socket object which it contains.
 Otherwise it returns null.

 so someone from thirft gurus need to tell us whats happening. I have no
 knowledge of this depth

 may be Ashutosh or Thejas will be able to help on this.




 From the netstat close_wait, it looks like the hive metastore server has
 not closed the connection (do not know why yet), may be the hive dev guys
 can help.Are there too many connections in close_wait state?



 On Tue, Jul 30, 2013 at 5:52 AM, agateaaa agate...@gmail.com wrote:

  Looking at the hive metastore server logs see errors like these:
 
  2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer
  (TThreadPoolServer.java:run(182)) - Error occurred during processing of
  message.
  java.lang.NullPointerException
  at
 
 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
  at
 
 
 org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
  at
 
 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
 
  approx same time as we see timeout or connection reset errors.
 
  Dont know if this is the cause or the side affect of he connection
  timeout/connection reset errors. Does anybody have any pointers or
  suggestions ?
 
  Thanks
 
 
  On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote:
 
   Thanks Nitin!
  
   We have simiar setup (identical hcatalog and hive server versions) on
 a
   another production environment and dont see any errors (its been
 running
  ok
   for a few months)
  
   Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or
 hive
   0.10 soon.
  
   I did see that the last time we ran into this problem doing a
 netstat-ntp
   | grep :1 see that server was holding on to one socket
 connection
  in
   CLOSE_WAIT state for a long time
(hive metastore server is running on port 1). Dont know if thats
   relevant here or not
  
   Can you suggest any hive configuration settings we can tweak or
  networking
   tools/tips, we can use to 

Re: Problème with min function in HiveQL

2013-08-29 Thread Stephen Sprague
well.  i would suggest you test whether or not min() works on timestamp
datatype.  it seems like something one should rule out first before going
down the rabbit hole further. My opinion only!


On Thu, Aug 29, 2013 at 9:28 AM, Jérôme Verdier
verdier.jerom...@gmail.comwrote:

 **
 Hi stephen,

 Thanks for your reply.

 Effectively, dt_jour is timestamp format.

 What is the problem with this?
 --
 *From: * Stephen Sprague sprag...@gmail.com
 *Date: *Thu, 29 Aug 2013 09:24:27 -0700
 *To: *user@hive.apache.orguser@hive.apache.org
 *ReplyTo: * user@hive.apache.org
 *Subject: *Re: Problème with min function in HiveQL

 the min function at column 62 is on on the column b.dt_jour.  what
 datatype is that?

 if its of type 'timestamp' that might explain it.


 On Thu, Aug 29, 2013 at 3:01 AM, Jérôme Verdier 
 verdier.jerom...@gmail.com wrote:

 Hi everybody,

 I am coding some HiveQL script to do some calculations.

 I have a problem with the min() function.

 My hive script is below :

 INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM

 SELECT
 '${hiveconf:in_co_societe}'   as co_societe,
 '${hiveconf:in_co_an_semaine}'as co_an_semaine,
 a.type_entite as type_entite,
 a.code_entite as code_entite,
 a.type_rgrp_produits  as type_rgrp_produits,
 a.co_rgrp_produitsas co_rgrp_produits,
 SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
 SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
 SUM(a.NB_CLIENTS) as NB_CLIENTS,
 SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
 SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
 SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
   from
 default.THM_CA_RGRP_PRODUITS_JOUR a
   JOIN default.CALENDRIER b
   -- A verifier
   WHERE CAST(a.dt_jour AS TIMESTAMP) =
 CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
   AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS
 TIMESTAMP)
   AND a.co_societe = '${hiveconf:in_co_societe}'
   AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
   GROUP BY
 a.type_entite,
 a.code_entite,
 a.type_rgrp_produits,
 a.co_rgrp_produits;

 And, when i try to launch this, i get this error :

 FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported
 place for UDAF 'min'

 Obviously, there is a problem with min() function.

 How can i pass through this error?

 Thanks for your help





Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-08-29 Thread agateaaa
Sorry hit send too soon ...

Hi All:

Put some debugging code in TUGIContainingTransport.getTransport() and I
tracked it down to

@Override
public TUGIContainingTransport getTransport(TTransport trans) {

// UGI information is not available at connection setup time, it will be
set later
// via set_ugi() rpc.
transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

//return transMap.get(trans); //-change
  TUGIContainingTransport retTrans = transMap.get(trans);

  if ( retTrans == null ) {
 LOGGER.error ( cannot find transport that was in map !!)
   }  else {
 LOGGER.debug ( cannot find transport that was in map !!)
 return retTrans;
   }
}

When we run this in our test environment, see that we run into the problem
just after GC runs,
and cannot find transport that was in the map!! message gets logged.

Could the GC be collecting entries from transMap, just before the we get it

Tried a minor change which seems to work

public TUGIContainingTransport getTransport(TTransport trans) {

   TUGIContainingTransport retTrans = transMap.get(trans);

if ( retTrans == null ) {
// UGI information is not available at connection setup time, it will be
set later
// via set_ugi() rpc.
transMap.putIfAbsent(trans, retTrans);
}
   return retTrans;
}


My questions for hive and  thrift experts

1.) Do we need to use a ConcurrentMap
ConcurrentMapTTransport, TUGIContainingTransport transMap = new
MapMaker().weakKeys().weakValues().makeMap();
It does use == to compare keys (which might be the problem), also in this
case we cant rely on the trans to be always there in the transMap, even
after a put, so in that case change above
probably makes sense


2.) Is it better idea to use WeakHashMap with WeakReference instead ? (was
looking at org.apache.thrift.transport.TSaslServerTransport, esp change
made by THRIFT-1468)

e.g.
private static MapTTransport, WeakReferenceTUGIContainingTransport
transMap3 = Collections.synchronizedMap(new WeakHashMapTTransport,
WeakReferenceTUGIContainingTransport());

getTransport() would be something like

public TUGIContainingTransport getTransport(TTransport trans) {
WeakReferenceTUGIContainingTransport ret = transMap.get(trans);
if (ret == null || ret.get() == null) {
ret = new WeakReferenceTUGIContainingTransport(new
TUGIContainingTransport(trans));
transMap3.put(trans, ret); // No need for putIfAbsent().
// Concurrent calls to getTransport() will pass in different TTransports.
}
return ret.get();
}


I did try 1.) above in our test environment and it does seem to resolve the
problem, though i am not sure if I am introducing any other problem


Can someone help ?


Thanks
Agatea













On Thu, Aug 29, 2013 at 10:57 AM, agateaaa agate...@gmail.com wrote:

 Hi All:

 Put some debugging code in TUGIContainingTransport.getTransport() and I
 tracked it down to

 @Override
 public TUGIContainingTransport getTransport(TTransport trans) {

 // UGI information is not available at connection setup time, it will be
 set later
 // via set_ugi() rpc.
 transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

 //return transMap.get(trans); -change
   TUGIContainingTransport retTrans = transMap.get(trans);

   if ( retTrans == null ) {



 }





 On Wed, Jul 31, 2013 at 9:48 AM, agateaaa agate...@gmail.com wrote:

 Thanks Nitin

 There arent too many connections in close_wait state only 1 or two when
 we run into this. Most likely its because of dropped connection.

 I could not find any read or write timeouts we can set for the thrift
 server which will tell thrift to hold on to the client connection.
  See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt
 seem to have been implemented yet. We do have set a client connection
 timeout but cannot find
 an equivalent setting for the server.

 We have  a suspicion that this happens when we run two client processes
 which modify two distinct partitions of the same hive table. We put in a
 workaround so that the two hive client processes never run together and so
 far things look ok but we will keep monitoring.

 Could it be because hive metastore server is not thread safe, would
 running two alter table statements on two distinct partitions of the same
 table using two client connections cause problems like these, where hive
 metastore server closes or drops a wrong client connection and leaves the
 other hanging?

 Agateaaa




 On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 The mentioned flow is called when you have unsecure mode of thrift
 metastore client-server connection. So one way to avoid this is have a
 secure way.

 code
 public boolean process(final TProtocol in, final TProtocol out)
 throwsTException {
 setIpAddress(in);
 ...
 ...
 ...
 @Override
  protected void setIpAddress(final TProtocol in) {
 TUGIContainingTransport ugiTrans =
 (TUGIContainingTransport)in.getTransport();
  

Re: Problème with min function in HiveQL

2013-08-29 Thread Jason Dere
Looks like the issue is the use of min() within the WHERE clause - the place 
where the exception is being thrown has the following comment:
// UDAF in filter condition, group-by caluse, param of funtion, etc.


On Aug 29, 2013, at 3:01 AM, Jérôme Verdier verdier.jerom...@gmail.com wrote:

 Hi everybody,
 
 I am coding some HiveQL script to do some calculations.
 
 I have a problem with the min() function.
 
 My hive script is below :
 
 INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM
 
 SELECT
 '${hiveconf:in_co_societe}'   as co_societe,
 '${hiveconf:in_co_an_semaine}'as co_an_semaine,
 a.type_entite as type_entite,
 a.code_entite as code_entite,
 a.type_rgrp_produits  as type_rgrp_produits,
 a.co_rgrp_produitsas co_rgrp_produits,
 SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
 SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
 SUM(a.NB_CLIENTS) as NB_CLIENTS,
 SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
 SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP,
 SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
   from
 default.THM_CA_RGRP_PRODUITS_JOUR a
   JOIN default.CALENDRIER b
   -- A verifier
   WHERE CAST(a.dt_jour AS TIMESTAMP) = 
 CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
   AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS TIMESTAMP)
   AND a.co_societe = '${hiveconf:in_co_societe}'
   AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
   GROUP BY
 a.type_entite,
 a.code_entite,
 a.type_rgrp_produits,
 a.co_rgrp_produits;
 
 And, when i try to launch this, i get this error : 
 
 FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported place 
 for UDAF 'min'
 
 Obviously, there is a problem with min() function.
 
 How can i pass through this error?
 
 Thanks for your help
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Problème with min function in HiveQL

2013-08-29 Thread John Meagher
Aggregate functions need to go in a HAVING clause instead of the WHERE
clause.  WHERE clauses are applied prior to aggregation, HAVING is
applied post aggregation.

select ...
from ...
where  some row level filter
group by ...
having some aggregate level filter


On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com wrote:
 Looks like the issue is the use of min() within the WHERE clause - the place
 where the exception is being thrown has the following comment:
 // UDAF in filter condition, group-by caluse, param of funtion, etc.


 On Aug 29, 2013, at 3:01 AM, Jérôme Verdier verdier.jerom...@gmail.com
 wrote:

 Hi everybody,

 I am coding some HiveQL script to do some calculations.

 I have a problem with the min() function.

 My hive script is below :

 INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM

 SELECT
 '${hiveconf:in_co_societe}'   as co_societe,
 '${hiveconf:in_co_an_semaine}'as co_an_semaine,
 a.type_entite as type_entite,
 a.code_entite as code_entite,
 a.type_rgrp_produits  as type_rgrp_produits,
 a.co_rgrp_produitsas co_rgrp_produits,
 SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
 SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
 SUM(a.NB_CLIENTS) as NB_CLIENTS,
 SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
 SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP,
 SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
   from
 default.THM_CA_RGRP_PRODUITS_JOUR a
   JOIN default.CALENDRIER b
   -- A verifier
   WHERE CAST(a.dt_jour AS TIMESTAMP) =
 CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
   AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS TIMESTAMP)
   AND a.co_societe = '${hiveconf:in_co_societe}'
   AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
   GROUP BY
 a.type_entite,
 a.code_entite,
 a.type_rgrp_produits,
 a.co_rgrp_produits;

 And, when i try to launch this, i get this error :

 FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported place
 for UDAF 'min'

 Obviously, there is a problem with min() function.

 How can i pass through this error?

 Thanks for your help



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader of
 this message is not the intended recipient, you are hereby notified that any
 printing, copying, dissemination, distribution, disclosure or forwarding of
 this communication is strictly prohibited. If you have received this
 communication in error, please contact the sender immediately and delete it
 from your system. Thank You.


Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-08-29 Thread Ashutosh Chauhan
Thanks Agatea for digging in. Seems like you have hit a bug. Would you mind
opening a jira and adding your findings to it.

Thanks,
Ashutosh


On Thu, Aug 29, 2013 at 11:22 AM, agateaaa agate...@gmail.com wrote:

 Sorry hit send too soon ...

 Hi All:

 Put some debugging code in TUGIContainingTransport.getTransport() and I
 tracked it down to

 @Override
 public TUGIContainingTransport getTransport(TTransport trans) {

 // UGI information is not available at connection setup time, it will be
 set later
 // via set_ugi() rpc.
 transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

 //return transMap.get(trans); //-change
   TUGIContainingTransport retTrans = transMap.get(trans);

   if ( retTrans == null ) {
  LOGGER.error ( cannot find transport that was in map !!)
}  else {
  LOGGER.debug ( cannot find transport that was in map !!)
  return retTrans;
}
 }

 When we run this in our test environment, see that we run into the problem
 just after GC runs,
 and cannot find transport that was in the map!! message gets logged.

 Could the GC be collecting entries from transMap, just before the we get it

 Tried a minor change which seems to work

 public TUGIContainingTransport getTransport(TTransport trans) {

TUGIContainingTransport retTrans = transMap.get(trans);

 if ( retTrans == null ) {
 // UGI information is not available at connection setup time, it will be
 set later
 // via set_ugi() rpc.
 transMap.putIfAbsent(trans, retTrans);
 }
return retTrans;
 }


 My questions for hive and  thrift experts

 1.) Do we need to use a ConcurrentMap
 ConcurrentMapTTransport, TUGIContainingTransport transMap = new
 MapMaker().weakKeys().weakValues().makeMap();
 It does use == to compare keys (which might be the problem), also in this
 case we cant rely on the trans to be always there in the transMap, even
 after a put, so in that case change above
 probably makes sense


 2.) Is it better idea to use WeakHashMap with WeakReference instead ? (was
 looking at org.apache.thrift.transport.TSaslServerTransport, esp change
 made by THRIFT-1468)

 e.g.
 private static MapTTransport, WeakReferenceTUGIContainingTransport
 transMap3 = Collections.synchronizedMap(new WeakHashMapTTransport,
 WeakReferenceTUGIContainingTransport());

 getTransport() would be something like

 public TUGIContainingTransport getTransport(TTransport trans) {
 WeakReferenceTUGIContainingTransport ret = transMap.get(trans);
 if (ret == null || ret.get() == null) {
 ret = new WeakReferenceTUGIContainingTransport(new
 TUGIContainingTransport(trans));
 transMap3.put(trans, ret); // No need for putIfAbsent().
 // Concurrent calls to getTransport() will pass in different TTransports.
 }
 return ret.get();
 }


 I did try 1.) above in our test environment and it does seem to resolve the
 problem, though i am not sure if I am introducing any other problem


 Can someone help ?


 Thanks
 Agatea













 On Thu, Aug 29, 2013 at 10:57 AM, agateaaa agate...@gmail.com wrote:

  Hi All:
 
  Put some debugging code in TUGIContainingTransport.getTransport() and I
  tracked it down to
 
  @Override
  public TUGIContainingTransport getTransport(TTransport trans) {
 
  // UGI information is not available at connection setup time, it will be
  set later
  // via set_ugi() rpc.
  transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));
 
  //return transMap.get(trans); -change
TUGIContainingTransport retTrans = transMap.get(trans);
 
if ( retTrans == null ) {
 
 
 
  }
 
 
 
 
 
  On Wed, Jul 31, 2013 at 9:48 AM, agateaaa agate...@gmail.com wrote:
 
  Thanks Nitin
 
  There arent too many connections in close_wait state only 1 or two when
  we run into this. Most likely its because of dropped connection.
 
  I could not find any read or write timeouts we can set for the thrift
  server which will tell thrift to hold on to the client connection.
   See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt
  seem to have been implemented yet. We do have set a client connection
  timeout but cannot find
  an equivalent setting for the server.
 
  We have  a suspicion that this happens when we run two client processes
  which modify two distinct partitions of the same hive table. We put in a
  workaround so that the two hive client processes never run together and
 so
  far things look ok but we will keep monitoring.
 
  Could it be because hive metastore server is not thread safe, would
  running two alter table statements on two distinct partitions of the
 same
  table using two client connections cause problems like these, where hive
  metastore server closes or drops a wrong client connection and leaves
 the
  other hanging?
 
  Agateaaa
 
 
 
 
  On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar nitinpawar...@gmail.com
 wrote:
 
  The mentioned flow is called when you have unsecure mode of thrift
  metastore client-server 

Re: Problème with min function in HiveQL

2013-08-29 Thread Stephen Sprague
indeed. you nailed it.


On Thu, Aug 29, 2013 at 11:53 AM, John Meagher john.meag...@gmail.comwrote:

 Aggregate functions need to go in a HAVING clause instead of the WHERE
 clause.  WHERE clauses are applied prior to aggregation, HAVING is
 applied post aggregation.

 select ...
 from ...
 where  some row level filter
 group by ...
 having some aggregate level filter


 On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com wrote:
  Looks like the issue is the use of min() within the WHERE clause - the
 place
  where the exception is being thrown has the following comment:
  // UDAF in filter condition, group-by caluse, param of funtion,
 etc.
 
 
  On Aug 29, 2013, at 3:01 AM, Jérôme Verdier verdier.jerom...@gmail.com
  wrote:
 
  Hi everybody,
 
  I am coding some HiveQL script to do some calculations.
 
  I have a problem with the min() function.
 
  My hive script is below :
 
  INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM
 
  SELECT
  '${hiveconf:in_co_societe}'   as co_societe,
  '${hiveconf:in_co_an_semaine}'as co_an_semaine,
  a.type_entite as type_entite,
  a.code_entite as code_entite,
  a.type_rgrp_produits  as type_rgrp_produits,
  a.co_rgrp_produitsas co_rgrp_produits,
  SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
  SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
  SUM(a.NB_CLIENTS) as NB_CLIENTS,
  SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
  SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
  SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
from
  default.THM_CA_RGRP_PRODUITS_JOUR a
JOIN default.CALENDRIER b
-- A verifier
WHERE CAST(a.dt_jour AS TIMESTAMP) =
  CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS
 TIMESTAMP)
AND a.co_societe = '${hiveconf:in_co_societe}'
AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
GROUP BY
  a.type_entite,
  a.code_entite,
  a.type_rgrp_produits,
  a.co_rgrp_produits;
 
  And, when i try to launch this, i get this error :
 
  FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported
 place
  for UDAF 'min'
 
  Obviously, there is a problem with min() function.
 
  How can i pass through this error?
 
  Thanks for your help
 
 
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader of
  this message is not the intended recipient, you are hereby notified that
 any
  printing, copying, dissemination, distribution, disclosure or forwarding
 of
  this communication is strictly prohibited. If you have received this
  communication in error, please contact the sender immediately and delete
 it
  from your system. Thank You.



Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-08-29 Thread agateaaa
Thanks Ashutosh.

Filed https://issues.apache.org/jira/browse/HIVE-5172



On Thu, Aug 29, 2013 at 11:53 AM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Thanks Agatea for digging in. Seems like you have hit a bug. Would you
 mind opening a jira and adding your findings to it.

 Thanks,
 Ashutosh


 On Thu, Aug 29, 2013 at 11:22 AM, agateaaa agate...@gmail.com wrote:

 Sorry hit send too soon ...

 Hi All:

 Put some debugging code in TUGIContainingTransport.getTransport() and I
 tracked it down to

 @Override
 public TUGIContainingTransport getTransport(TTransport trans) {

 // UGI information is not available at connection setup time, it will be
 set later
 // via set_ugi() rpc.
 transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

 //return transMap.get(trans); //-change
   TUGIContainingTransport retTrans = transMap.get(trans);

   if ( retTrans == null ) {
  LOGGER.error ( cannot find transport that was in map !!)
}  else {
  LOGGER.debug ( cannot find transport that was in map !!)
  return retTrans;
}
 }

 When we run this in our test environment, see that we run into the problem
 just after GC runs,
 and cannot find transport that was in the map!! message gets logged.

 Could the GC be collecting entries from transMap, just before the we get
 it

 Tried a minor change which seems to work

 public TUGIContainingTransport getTransport(TTransport trans) {

TUGIContainingTransport retTrans = transMap.get(trans);

 if ( retTrans == null ) {
 // UGI information is not available at connection setup time, it will be
 set later
 // via set_ugi() rpc.
 transMap.putIfAbsent(trans, retTrans);
 }
return retTrans;
 }


 My questions for hive and  thrift experts

 1.) Do we need to use a ConcurrentMap
 ConcurrentMapTTransport, TUGIContainingTransport transMap = new
 MapMaker().weakKeys().weakValues().makeMap();
 It does use == to compare keys (which might be the problem), also in this
 case we cant rely on the trans to be always there in the transMap, even
 after a put, so in that case change above
 probably makes sense


 2.) Is it better idea to use WeakHashMap with WeakReference instead ? (was
 looking at org.apache.thrift.transport.TSaslServerTransport, esp change
 made by THRIFT-1468)

 e.g.
 private static MapTTransport, WeakReferenceTUGIContainingTransport
 transMap3 = Collections.synchronizedMap(new WeakHashMapTTransport,
 WeakReferenceTUGIContainingTransport());

 getTransport() would be something like

 public TUGIContainingTransport getTransport(TTransport trans) {
 WeakReferenceTUGIContainingTransport ret = transMap.get(trans);
 if (ret == null || ret.get() == null) {
 ret = new WeakReferenceTUGIContainingTransport(new
 TUGIContainingTransport(trans));
 transMap3.put(trans, ret); // No need for putIfAbsent().
 // Concurrent calls to getTransport() will pass in different TTransports.
 }
 return ret.get();
 }


 I did try 1.) above in our test environment and it does seem to resolve
 the
 problem, though i am not sure if I am introducing any other problem


 Can someone help ?


 Thanks
 Agatea













 On Thu, Aug 29, 2013 at 10:57 AM, agateaaa agate...@gmail.com wrote:

  Hi All:
 
  Put some debugging code in TUGIContainingTransport.getTransport() and I
  tracked it down to
 
  @Override
  public TUGIContainingTransport getTransport(TTransport trans) {
 
  // UGI information is not available at connection setup time, it will be
  set later
  // via set_ugi() rpc.
  transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));
 
  //return transMap.get(trans); -change
TUGIContainingTransport retTrans = transMap.get(trans);
 
if ( retTrans == null ) {
 
 
 
  }
 
 
 
 
 
  On Wed, Jul 31, 2013 at 9:48 AM, agateaaa agate...@gmail.com wrote:
 
  Thanks Nitin
 
  There arent too many connections in close_wait state only 1 or two when
  we run into this. Most likely its because of dropped connection.
 
  I could not find any read or write timeouts we can set for the thrift
  server which will tell thrift to hold on to the client connection.
   See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt
  seem to have been implemented yet. We do have set a client connection
  timeout but cannot find
  an equivalent setting for the server.
 
  We have  a suspicion that this happens when we run two client processes
  which modify two distinct partitions of the same hive table. We put in
 a
  workaround so that the two hive client processes never run together
 and so
  far things look ok but we will keep monitoring.
 
  Could it be because hive metastore server is not thread safe, would
  running two alter table statements on two distinct partitions of the
 same
  table using two client connections cause problems like these, where
 hive
  metastore server closes or drops a wrong client connection and leaves
 the
  other hanging?
 
  Agateaaa
 
 
 
 
  On Tue, Jul 30, 

Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-08-29 Thread agateaaa
Thanks Ashutosh.

Filed https://issues.apache.org/jira/browse/HIVE-5172


On Thu, Aug 29, 2013 at 11:53 AM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Thanks Agatea for digging in. Seems like you have hit a bug. Would you
 mind opening a jira and adding your findings to it.

 Thanks,
 Ashutosh


 On Thu, Aug 29, 2013 at 11:22 AM, agateaaa agate...@gmail.com wrote:

 Sorry hit send too soon ...

 Hi All:

 Put some debugging code in TUGIContainingTransport.getTransport() and I
 tracked it down to

 @Override
 public TUGIContainingTransport getTransport(TTransport trans) {

 // UGI information is not available at connection setup time, it will be
 set later
 // via set_ugi() rpc.
 transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

 //return transMap.get(trans); //-change
   TUGIContainingTransport retTrans = transMap.get(trans);

   if ( retTrans == null ) {
  LOGGER.error ( cannot find transport that was in map !!)
}  else {
  LOGGER.debug ( cannot find transport that was in map !!)
  return retTrans;
}
 }

 When we run this in our test environment, see that we run into the problem
 just after GC runs,
 and cannot find transport that was in the map!! message gets logged.

 Could the GC be collecting entries from transMap, just before the we get
 it

 Tried a minor change which seems to work

 public TUGIContainingTransport getTransport(TTransport trans) {

TUGIContainingTransport retTrans = transMap.get(trans);

 if ( retTrans == null ) {
 // UGI information is not available at connection setup time, it will be
 set later
 // via set_ugi() rpc.
 transMap.putIfAbsent(trans, retTrans);
 }
return retTrans;
 }


 My questions for hive and  thrift experts

 1.) Do we need to use a ConcurrentMap
 ConcurrentMapTTransport, TUGIContainingTransport transMap = new
 MapMaker().weakKeys().weakValues().makeMap();
 It does use == to compare keys (which might be the problem), also in this
 case we cant rely on the trans to be always there in the transMap, even
 after a put, so in that case change above
 probably makes sense


 2.) Is it better idea to use WeakHashMap with WeakReference instead ? (was
 looking at org.apache.thrift.transport.TSaslServerTransport, esp change
 made by THRIFT-1468)

 e.g.
 private static MapTTransport, WeakReferenceTUGIContainingTransport
 transMap3 = Collections.synchronizedMap(new WeakHashMapTTransport,
 WeakReferenceTUGIContainingTransport());

 getTransport() would be something like

 public TUGIContainingTransport getTransport(TTransport trans) {
 WeakReferenceTUGIContainingTransport ret = transMap.get(trans);
 if (ret == null || ret.get() == null) {
 ret = new WeakReferenceTUGIContainingTransport(new
 TUGIContainingTransport(trans));
 transMap3.put(trans, ret); // No need for putIfAbsent().
 // Concurrent calls to getTransport() will pass in different TTransports.
 }
 return ret.get();
 }


 I did try 1.) above in our test environment and it does seem to resolve
 the
 problem, though i am not sure if I am introducing any other problem


 Can someone help ?


 Thanks
 Agatea













 On Thu, Aug 29, 2013 at 10:57 AM, agateaaa agate...@gmail.com wrote:

  Hi All:
 
  Put some debugging code in TUGIContainingTransport.getTransport() and I
  tracked it down to
 
  @Override
  public TUGIContainingTransport getTransport(TTransport trans) {
 
  // UGI information is not available at connection setup time, it will be
  set later
  // via set_ugi() rpc.
  transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));
 
  //return transMap.get(trans); -change
TUGIContainingTransport retTrans = transMap.get(trans);
 
if ( retTrans == null ) {
 
 
 
  }
 
 
 
 
 
  On Wed, Jul 31, 2013 at 9:48 AM, agateaaa agate...@gmail.com wrote:
 
  Thanks Nitin
 
  There arent too many connections in close_wait state only 1 or two when
  we run into this. Most likely its because of dropped connection.
 
  I could not find any read or write timeouts we can set for the thrift
  server which will tell thrift to hold on to the client connection.
   See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt
  seem to have been implemented yet. We do have set a client connection
  timeout but cannot find
  an equivalent setting for the server.
 
  We have  a suspicion that this happens when we run two client processes
  which modify two distinct partitions of the same hive table. We put in
 a
  workaround so that the two hive client processes never run together
 and so
  far things look ok but we will keep monitoring.
 
  Could it be because hive metastore server is not thread safe, would
  running two alter table statements on two distinct partitions of the
 same
  table using two client connections cause problems like these, where
 hive
  metastore server closes or drops a wrong client connection and leaves
 the
  other hanging?
 
  Agateaaa
 
 
 
 
  On Tue, Jul 30, 2013