Re: Problème with min function in HiveQL

2013-08-30 Thread Jérôme Verdier
Ok, thanks for this solution.

Unfortunately, i have rewrited my script like this :

INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM

SELECT
'${hiveconf:in_co_societe}'   as co_societe,
'${hiveconf:in_co_an_semaine}'as co_an_semaine,
a.type_entite as type_entite,
a.code_entite as code_entite,
a.type_rgrp_produits  as type_rgrp_produits,
a.co_rgrp_produitsas co_rgrp_produits,
SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
SUM(a.NB_CLIENTS) as NB_CLIENTS,
SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP,
SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
  from
default.THM_CA_RGRP_PRODUITS_JOUR a
  JOIN default.CALENDRIER b
  -- A verifier
  WHERE a.co_societe = '${hiveconf:in_co_societe}'
  AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
  GROUP BY
a.type_entite,
a.code_entite,
a.type_rgrp_produits,
a.co_rgrp_produits;
  HAVING
a.dt_jour = MIN(b.dt_jour)
AND a.dt_jour  MIN(b.dt_jour)+1

And i have a new error :

FAILED: ParseException line 2:6 cannot recognize input near 'HAVING' 'a' '.'

My Hive version is 0.11 and HAVING function have been implemented in 0.7 so
it would work fine.



2013/8/29 Stephen Sprague sprag...@gmail.com

 indeed. you nailed it.


 On Thu, Aug 29, 2013 at 11:53 AM, John Meagher john.meag...@gmail.comwrote:

 Aggregate functions need to go in a HAVING clause instead of the WHERE
 clause.  WHERE clauses are applied prior to aggregation, HAVING is
 applied post aggregation.

 select ...
 from ...
 where  some row level filter
 group by ...
 having some aggregate level filter


 On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com
 wrote:
  Looks like the issue is the use of min() within the WHERE clause - the
 place
  where the exception is being thrown has the following comment:
  // UDAF in filter condition, group-by caluse, param of funtion,
 etc.
 
 
  On Aug 29, 2013, at 3:01 AM, Jérôme Verdier verdier.jerom...@gmail.com
 
  wrote:
 
  Hi everybody,
 
  I am coding some HiveQL script to do some calculations.
 
  I have a problem with the min() function.
 
  My hive script is below :
 
  INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM
 
  SELECT
  '${hiveconf:in_co_societe}'   as co_societe,
  '${hiveconf:in_co_an_semaine}'as co_an_semaine,
  a.type_entite as type_entite,
  a.code_entite as code_entite,
  a.type_rgrp_produits  as type_rgrp_produits,
  a.co_rgrp_produitsas co_rgrp_produits,
  SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
  SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
  SUM(a.NB_CLIENTS) as NB_CLIENTS,
  SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
  SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
  SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
from
  default.THM_CA_RGRP_PRODUITS_JOUR a
JOIN default.CALENDRIER b
-- A verifier
WHERE CAST(a.dt_jour AS TIMESTAMP) =
  CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS
 TIMESTAMP)
AND a.co_societe = '${hiveconf:in_co_societe}'
AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
GROUP BY
  a.type_entite,
  a.code_entite,
  a.type_rgrp_produits,
  a.co_rgrp_produits;
 
  And, when i try to launch this, i get this error :
 
  FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported
 place
  for UDAF 'min'
 
  Obviously, there is a problem with min() function.
 
  How can i pass through this error?
 
  Thanks for your help
 
 
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader of
  this message is not the intended recipient, you are hereby notified
 that any
  printing, copying, dissemination, distribution, disclosure or
 forwarding of
  this communication is strictly prohibited. If you have received this
  communication in error, please contact the sender immediately and
 delete it
  from your system. Thank You.





-- 
*Jérôme VERDIER*
06.72.19.17.31

Re: Problème with min function in HiveQL

2013-08-30 Thread Stephen Sprague
awright Jerome.  look closely at the error message. you can figure this one
out.


On Fri, Aug 30, 2013 at 1:17 AM, Jérôme Verdier
verdier.jerom...@gmail.comwrote:

 Ok, thanks for this solution.

 Unfortunately, i have rewrited my script like this :

 INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM

 SELECT
 '${hiveconf:in_co_societe}'   as co_societe,
 '${hiveconf:in_co_an_semaine}'as co_an_semaine,
 a.type_entite as type_entite,
 a.code_entite as code_entite,
 a.type_rgrp_produits  as type_rgrp_produits,
 a.co_rgrp_produitsas co_rgrp_produits,
 SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
 SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
 SUM(a.NB_CLIENTS) as NB_CLIENTS,
 SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
 SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
 SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
   from
 default.THM_CA_RGRP_PRODUITS_JOUR a
   JOIN default.CALENDRIER b
   -- A verifier
   WHERE a.co_societe = '${hiveconf:in_co_societe}'

   AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
   GROUP BY
 a.type_entite,
 a.code_entite,
 a.type_rgrp_produits,
 a.co_rgrp_produits;
HAVING
 a.dt_jour = MIN(b.dt_jour)
 AND a.dt_jour  MIN(b.dt_jour)+1

 And i have a new error :

 FAILED: ParseException line 2:6 cannot recognize input near 'HAVING' 'a'
 '.'

 My Hive version is 0.11 and HAVING function have been implemented in 0.7
 so it would work fine.



 2013/8/29 Stephen Sprague sprag...@gmail.com

 indeed. you nailed it.


 On Thu, Aug 29, 2013 at 11:53 AM, John Meagher john.meag...@gmail.comwrote:

 Aggregate functions need to go in a HAVING clause instead of the WHERE
 clause.  WHERE clauses are applied prior to aggregation, HAVING is
 applied post aggregation.

 select ...
 from ...
 where  some row level filter
 group by ...
 having some aggregate level filter


 On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com
 wrote:
  Looks like the issue is the use of min() within the WHERE clause - the
 place
  where the exception is being thrown has the following comment:
  // UDAF in filter condition, group-by caluse, param of
 funtion, etc.
 
 
  On Aug 29, 2013, at 3:01 AM, Jérôme Verdier 
 verdier.jerom...@gmail.com
  wrote:
 
  Hi everybody,
 
  I am coding some HiveQL script to do some calculations.
 
  I have a problem with the min() function.
 
  My hive script is below :
 
  INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM
 
  SELECT
  '${hiveconf:in_co_societe}'   as co_societe,
  '${hiveconf:in_co_an_semaine}'as co_an_semaine,
  a.type_entite as type_entite,
  a.code_entite as code_entite,
  a.type_rgrp_produits  as
 type_rgrp_produits,
  a.co_rgrp_produitsas co_rgrp_produits,
  SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
  SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
  SUM(a.NB_CLIENTS) as NB_CLIENTS,
  SUM(a.MT_CA_NET_TTC_COMP) as
 MT_CA_NET_TTC_COMP,
  SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
  SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
from
  default.THM_CA_RGRP_PRODUITS_JOUR a
JOIN default.CALENDRIER b
-- A verifier
WHERE CAST(a.dt_jour AS TIMESTAMP) =
  CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS
 TIMESTAMP)
AND a.co_societe = '${hiveconf:in_co_societe}'
AND ('${hiveconf:flg_reprise_entite}' = 0 OR
 a.type_entite'MAG')
GROUP BY
  a.type_entite,
  a.code_entite,
  a.type_rgrp_produits,
  a.co_rgrp_produits;
 
  And, when i try to launch this, i get this error :
 
  FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported
 place
  for UDAF 'min'
 
  Obviously, there is a problem with min() function.
 
  How can i pass through this error?
 
  Thanks for your help
 
 
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader of
  this message is not the intended recipient, you are hereby notified
 that any
  printing, copying, dissemination, distribution, disclosure or
 forwarding of
  this communication is 

Re: Problème with min function in HiveQL

2013-08-30 Thread Jérôme Verdier
Thanks Stephen,

Yes i realise that it was so a stupid questionMaybe i wasn't really
awaked this morning ;-)

now it's working well.

Thanks everyone.


2013/8/30 Stephen Sprague sprag...@gmail.com

 awright Jerome.  look closely at the error message. you can figure this
 one out.


 On Fri, Aug 30, 2013 at 1:17 AM, Jérôme Verdier 
 verdier.jerom...@gmail.com wrote:

 Ok, thanks for this solution.

 Unfortunately, i have rewrited my script like this :

 INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM

 SELECT
 '${hiveconf:in_co_societe}'   as co_societe,
 '${hiveconf:in_co_an_semaine}'as co_an_semaine,
 a.type_entite as type_entite,
 a.code_entite as code_entite,
 a.type_rgrp_produits  as type_rgrp_produits,
 a.co_rgrp_produitsas co_rgrp_produits,
 SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
 SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
 SUM(a.NB_CLIENTS) as NB_CLIENTS,
 SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
 SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
 SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
   from
 default.THM_CA_RGRP_PRODUITS_JOUR a
   JOIN default.CALENDRIER b
   -- A verifier
   WHERE a.co_societe = '${hiveconf:in_co_societe}'

   AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
   GROUP BY
 a.type_entite,
 a.code_entite,
 a.type_rgrp_produits,
 a.co_rgrp_produits;
HAVING
 a.dt_jour = MIN(b.dt_jour)
 AND a.dt_jour  MIN(b.dt_jour)+1

 And i have a new error :

 FAILED: ParseException line 2:6 cannot recognize input near 'HAVING' 'a'
 '.'

 My Hive version is 0.11 and HAVING function have been implemented in 0.7
 so it would work fine.



 2013/8/29 Stephen Sprague sprag...@gmail.com

 indeed. you nailed it.


 On Thu, Aug 29, 2013 at 11:53 AM, John Meagher 
 john.meag...@gmail.comwrote:

 Aggregate functions need to go in a HAVING clause instead of the WHERE
 clause.  WHERE clauses are applied prior to aggregation, HAVING is
 applied post aggregation.

 select ...
 from ...
 where  some row level filter
 group by ...
 having some aggregate level filter


 On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com
 wrote:
  Looks like the issue is the use of min() within the WHERE clause -
 the place
  where the exception is being thrown has the following comment:
  // UDAF in filter condition, group-by caluse, param of
 funtion, etc.
 
 
  On Aug 29, 2013, at 3:01 AM, Jérôme Verdier 
 verdier.jerom...@gmail.com
  wrote:
 
  Hi everybody,
 
  I am coding some HiveQL script to do some calculations.
 
  I have a problem with the min() function.
 
  My hive script is below :
 
  INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM
 
  SELECT
  '${hiveconf:in_co_societe}'   as co_societe,
  '${hiveconf:in_co_an_semaine}'as co_an_semaine,
  a.type_entite as type_entite,
  a.code_entite as code_entite,
  a.type_rgrp_produits  as
 type_rgrp_produits,
  a.co_rgrp_produitsas co_rgrp_produits,
  SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
  SUM(a.MT_OBJ_CA_NET_TTC)  as
 MT_OBJ_CA_NET_TTC,
  SUM(a.NB_CLIENTS) as NB_CLIENTS,
  SUM(a.MT_CA_NET_TTC_COMP) as
 MT_CA_NET_TTC_COMP,
  SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
  SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
from
  default.THM_CA_RGRP_PRODUITS_JOUR a
JOIN default.CALENDRIER b
-- A verifier
WHERE CAST(a.dt_jour AS TIMESTAMP) =
  CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS
 TIMESTAMP)
AND a.co_societe = '${hiveconf:in_co_societe}'
AND ('${hiveconf:flg_reprise_entite}' = 0 OR
 a.type_entite'MAG')
GROUP BY
  a.type_entite,
  a.code_entite,
  a.type_rgrp_produits,
  a.co_rgrp_produits;
 
  And, when i try to launch this, i get this error :
 
  FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported
 place
  for UDAF 'min'
 
  Obviously, there is a problem with min() function.
 
  How can i pass through this error?
 
  Thanks for your help
 
 
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity to
  which it is addressed and may contain information that is
 confidential,
  privileged and exempt from disclosure under 

Re: Problème with min function in HiveQL

2013-08-30 Thread Jérôme Verdier
Hi,

Is it possible to have multiple conditions in Having clause ?

I get an error when i'm trying this.

Thanks a lot.


2013/8/30 Jérôme Verdier verdier.jerom...@gmail.com

 Thanks Stephen,

 Yes i realise that it was so a stupid questionMaybe i wasn't really
 awaked this morning ;-)

 now it's working well.

 Thanks everyone.


 2013/8/30 Stephen Sprague sprag...@gmail.com

 awright Jerome.  look closely at the error message. you can figure this
 one out.


 On Fri, Aug 30, 2013 at 1:17 AM, Jérôme Verdier 
 verdier.jerom...@gmail.com wrote:

 Ok, thanks for this solution.

 Unfortunately, i have rewrited my script like this :

 INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM

 SELECT
 '${hiveconf:in_co_societe}'   as co_societe,
 '${hiveconf:in_co_an_semaine}'as co_an_semaine,
 a.type_entite as type_entite,
 a.code_entite as code_entite,
 a.type_rgrp_produits  as type_rgrp_produits,
 a.co_rgrp_produitsas co_rgrp_produits,
 SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
 SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
 SUM(a.NB_CLIENTS) as NB_CLIENTS,
 SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
 SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
 SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
   from
 default.THM_CA_RGRP_PRODUITS_JOUR a
   JOIN default.CALENDRIER b
   -- A verifier
   WHERE a.co_societe = '${hiveconf:in_co_societe}'

   AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
   GROUP BY
 a.type_entite,
 a.code_entite,
 a.type_rgrp_produits,
 a.co_rgrp_produits;
HAVING
 a.dt_jour = MIN(b.dt_jour)
 AND a.dt_jour  MIN(b.dt_jour)+1

 And i have a new error :

 FAILED: ParseException line 2:6 cannot recognize input near 'HAVING' 'a'
 '.'

 My Hive version is 0.11 and HAVING function have been implemented in 0.7
 so it would work fine.



 2013/8/29 Stephen Sprague sprag...@gmail.com

 indeed. you nailed it.


 On Thu, Aug 29, 2013 at 11:53 AM, John Meagher 
 john.meag...@gmail.comwrote:

 Aggregate functions need to go in a HAVING clause instead of the WHERE
 clause.  WHERE clauses are applied prior to aggregation, HAVING is
 applied post aggregation.

 select ...
 from ...
 where  some row level filter
 group by ...
 having some aggregate level filter


 On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com
 wrote:
  Looks like the issue is the use of min() within the WHERE clause -
 the place
  where the exception is being thrown has the following comment:
  // UDAF in filter condition, group-by caluse, param of
 funtion, etc.
 
 
  On Aug 29, 2013, at 3:01 AM, Jérôme Verdier 
 verdier.jerom...@gmail.com
  wrote:
 
  Hi everybody,
 
  I am coding some HiveQL script to do some calculations.
 
  I have a problem with the min() function.
 
  My hive script is below :
 
  INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM
 
  SELECT
  '${hiveconf:in_co_societe}'   as co_societe,
  '${hiveconf:in_co_an_semaine}'as co_an_semaine,
  a.type_entite as type_entite,
  a.code_entite as code_entite,
  a.type_rgrp_produits  as
 type_rgrp_produits,
  a.co_rgrp_produitsas
 co_rgrp_produits,
  SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
  SUM(a.MT_OBJ_CA_NET_TTC)  as
 MT_OBJ_CA_NET_TTC,
  SUM(a.NB_CLIENTS) as NB_CLIENTS,
  SUM(a.MT_CA_NET_TTC_COMP) as
 MT_CA_NET_TTC_COMP,
  SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
  SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
from
  default.THM_CA_RGRP_PRODUITS_JOUR a
JOIN default.CALENDRIER b
-- A verifier
WHERE CAST(a.dt_jour AS TIMESTAMP) =
  CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS
 TIMESTAMP)
AND a.co_societe = '${hiveconf:in_co_societe}'
AND ('${hiveconf:flg_reprise_entite}' = 0 OR
 a.type_entite'MAG')
GROUP BY
  a.type_entite,
  a.code_entite,
  a.type_rgrp_produits,
  a.co_rgrp_produits;
 
  And, when i try to launch this, i get this error :
 
  FAILED: SemanticException [Error 10128]: Line 20:62 Not yet
 supported place
  for UDAF 'min'
 
  Obviously, there is a problem with min() function.
 
  How can i pass through this error?
 
  Thanks for your help
 
 
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message 

Mappers per job per user

2013-08-30 Thread Brad Ruderman
Hi All-
I was hoping to gather some insight in how the hadoop (and or hive) job
scheduler distributes mappers per user. I am running into an issue where I
see that hadoop (and or hive) is evenly distributing mappers per user
instead of per job.

For example:
-We have 1000 mapper capacity
-10 Jobs are running  total under User A
-Each job is using 100 mappers
-User B starts a job
-User B's job is allocated 250 mappers
-User A's jobs decrease to 75 mappers each instead of 100

What could be causing this allocation to occur by user and job, instead of
just by job. I reviewed the hive/hadoop documentation and wasn't able to
find any references to distributing jobs by user.

All of the jobs are being executed within the hive shell, or using the hive
command.

Thanks,
Brad


Pseudo column for the entire Line/Row ?

2013-08-30 Thread Stephen Boesch
I am writing a UDF that will perform validation on the input row and shall
require access to every column in the row (or alternatively to simply the
unparsed/pre-processed line).

Is there any way to achieve this?  Or will it be simply necessary to
declare an overloaded evaluate() method with a signature comprising every
field ?

Suggestion of alternative tactics also welcomed.


thanks

stephenb


Re: Mappers per job per user

2013-08-30 Thread kumar y
 and you can find that by looking at your mapred-site.xml . look for this
property

mapred-site.xml: namemapred.jobtracker.taskScheduler/name

you should be using the default FIFO or
valueorg.apache.hadoop.mapred.FairScheduler/value   or
valueorg.apache.hadoop.mapred.CapacityTaskScheduler/value

Depends on what scheduler you are using , please read thru the following
link to understand how the scheduler works ( like allocation , preemption
etc ) .


http://hadoop.apache.org/docs/stable/capacity_scheduler.html
http://hadoop.apache.org/docs/stable/fair_scheduler.html




 -- Forwarded message --
 From: Ravi Kiran maghamraviki...@gmail.com
 Date: Fri, Aug 30, 2013 at 10:30 AM
 Subject: Re: Mappers per job per user
 To: user@hive.apache.org


 Hi Brad,

   I believe you have configured a Capacity Scheduler for scheduling the
 jobs rather than the default FIFO scheduler.

 Regards
 Ravi Magham


 On Fri, Aug 30, 2013 at 10:07 PM, Brad Ruderman 
 bruder...@radiumone.comwrote:

 Hi All-
 I was hoping to gather some insight in how the hadoop (and or hive) job
 scheduler distributes mappers per user. I am running into an issue where I
 see that hadoop (and or hive) is evenly distributing mappers per user
 instead of per job.

 For example:
 -We have 1000 mapper capacity
 -10 Jobs are running  total under User A
 -Each job is using 100 mappers
 -User B starts a job
 -User B's job is allocated 250 mappers
 -User A's jobs decrease to 75 mappers each instead of 100

 What could be causing this allocation to occur by user and job, instead
 of just by job. I reviewed the hive/hadoop documentation and wasn't able to
 find any references to distributing jobs by user.

 All of the jobs are being executed within the hive shell, or using the
 hive command.

 Thanks,
 Brad






Re: Mappers per job per user

2013-08-30 Thread Brad Ruderman
Hi Kumar and Ravi-
Thanks for your quick responses. Although I don't believe changing anything
from default, it is possible because we are using a distribution. I will
check.

Thanks!

Brad


On Fri, Aug 30, 2013 at 1:46 PM, kumar y ykk1...@gmail.com wrote:


  and you can find that by looking at your mapred-site.xml . look for this
 property

 mapred-site.xml: namemapred.jobtracker.taskScheduler/name

 you should be using the default FIFO or
 valueorg.apache.hadoop.mapred.FairScheduler/value   or
 valueorg.apache.hadoop.mapred.CapacityTaskScheduler/value

 Depends on what scheduler you are using , please read thru the following
 link to understand how the scheduler works ( like allocation , preemption
 etc ) .


 http://hadoop.apache.org/docs/stable/capacity_scheduler.html
 http://hadoop.apache.org/docs/stable/fair_scheduler.html




 -- Forwarded message --
 From: Ravi Kiran maghamraviki...@gmail.com
 Date: Fri, Aug 30, 2013 at 10:30 AM
 Subject: Re: Mappers per job per user
 To: user@hive.apache.org


 Hi Brad,

   I believe you have configured a Capacity Scheduler for scheduling the
 jobs rather than the default FIFO scheduler.

 Regards
 Ravi Magham


 On Fri, Aug 30, 2013 at 10:07 PM, Brad Ruderman 
 bruder...@radiumone.comwrote:

 Hi All-
 I was hoping to gather some insight in how the hadoop (and or hive) job
 scheduler distributes mappers per user. I am running into an issue where I
 see that hadoop (and or hive) is evenly distributing mappers per user
 instead of per job.

 For example:
 -We have 1000 mapper capacity
 -10 Jobs are running  total under User A
 -Each job is using 100 mappers
 -User B starts a job
 -User B's job is allocated 250 mappers
 -User A's jobs decrease to 75 mappers each instead of 100

 What could be causing this allocation to occur by user and job, instead
 of just by job. I reviewed the hive/hadoop documentation and wasn't able to
 find any references to distributing jobs by user.

 All of the jobs are being executed within the hive shell, or using the
 hive command.

 Thanks,
 Brad







Re: Problème with min function in HiveQL

2013-08-30 Thread Stephen Sprague
there should be no limitation AFAIK.   example + error message is always
helpful - even if you might regret it. :)


On Fri, Aug 30, 2013 at 8:16 AM, Jérôme Verdier
verdier.jerom...@gmail.comwrote:

 Hi,

 Is it possible to have multiple conditions in Having clause ?

 I get an error when i'm trying this.

 Thanks a lot.


 2013/8/30 Jérôme Verdier verdier.jerom...@gmail.com

 Thanks Stephen,

 Yes i realise that it was so a stupid questionMaybe i wasn't really
 awaked this morning ;-)

 now it's working well.

 Thanks everyone.


 2013/8/30 Stephen Sprague sprag...@gmail.com

 awright Jerome.  look closely at the error message. you can figure this
 one out.


 On Fri, Aug 30, 2013 at 1:17 AM, Jérôme Verdier 
 verdier.jerom...@gmail.com wrote:

 Ok, thanks for this solution.

 Unfortunately, i have rewrited my script like this :

 INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM

 SELECT
 '${hiveconf:in_co_societe}'   as co_societe,
 '${hiveconf:in_co_an_semaine}'as co_an_semaine,
 a.type_entite as type_entite,
 a.code_entite as code_entite,
 a.type_rgrp_produits  as type_rgrp_produits,
 a.co_rgrp_produitsas co_rgrp_produits,
 SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
 SUM(a.MT_OBJ_CA_NET_TTC)  as MT_OBJ_CA_NET_TTC,
 SUM(a.NB_CLIENTS) as NB_CLIENTS,
 SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
 SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
 SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
   from
 default.THM_CA_RGRP_PRODUITS_JOUR a
   JOIN default.CALENDRIER b
   -- A verifier
   WHERE a.co_societe = '${hiveconf:in_co_societe}'

   AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG')
   GROUP BY
 a.type_entite,
 a.code_entite,
 a.type_rgrp_produits,
 a.co_rgrp_produits;
HAVING
 a.dt_jour = MIN(b.dt_jour)
 AND a.dt_jour  MIN(b.dt_jour)+1

 And i have a new error :

 FAILED: ParseException line 2:6 cannot recognize input near 'HAVING'
 'a' '.'

 My Hive version is 0.11 and HAVING function have been implemented in
 0.7 so it would work fine.



 2013/8/29 Stephen Sprague sprag...@gmail.com

 indeed. you nailed it.


 On Thu, Aug 29, 2013 at 11:53 AM, John Meagher john.meag...@gmail.com
  wrote:

 Aggregate functions need to go in a HAVING clause instead of the WHERE
 clause.  WHERE clauses are applied prior to aggregation, HAVING is
 applied post aggregation.

 select ...
 from ...
 where  some row level filter
 group by ...
 having some aggregate level filter


 On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com
 wrote:
  Looks like the issue is the use of min() within the WHERE clause -
 the place
  where the exception is being thrown has the following comment:
  // UDAF in filter condition, group-by caluse, param of
 funtion, etc.
 
 
  On Aug 29, 2013, at 3:01 AM, Jérôme Verdier 
 verdier.jerom...@gmail.com
  wrote:
 
  Hi everybody,
 
  I am coding some HiveQL script to do some calculations.
 
  I have a problem with the min() function.
 
  My hive script is below :
 
  INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM
 
  SELECT
  '${hiveconf:in_co_societe}'   as co_societe,
  '${hiveconf:in_co_an_semaine}'as co_an_semaine,
  a.type_entite as type_entite,
  a.code_entite as code_entite,
  a.type_rgrp_produits  as
 type_rgrp_produits,
  a.co_rgrp_produitsas
 co_rgrp_produits,
  SUM(a.MT_CA_NET_TTC)  as MT_CA_NET_TTC,
  SUM(a.MT_OBJ_CA_NET_TTC)  as
 MT_OBJ_CA_NET_TTC,
  SUM(a.NB_CLIENTS) as NB_CLIENTS,
  SUM(a.MT_CA_NET_TTC_COMP) as
 MT_CA_NET_TTC_COMP,
  SUM(a.MT_OBJ_CA_NET_TTC_COMP) as
 MT_OBJ_CA_NET_TTC_COMP,
  SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP
from
  default.THM_CA_RGRP_PRODUITS_JOUR a
JOIN default.CALENDRIER b
-- A verifier
WHERE CAST(a.dt_jour AS TIMESTAMP) =
  CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT)
AND CAST(a.dt_jour AS TIMESTAMP)  CAST(min(b.dt_jour)+1 AS
 TIMESTAMP)
AND a.co_societe = '${hiveconf:in_co_societe}'
AND ('${hiveconf:flg_reprise_entite}' = 0 OR
 a.type_entite'MAG')
GROUP BY
  a.type_entite,
  a.code_entite,
  a.type_rgrp_produits,
  a.co_rgrp_produits;
 
  And, when i try to launch this, i get this error :
 
  FAILED: SemanticException [Error 10128]: Line 20:62 Not yet
 

Re: how to write hive query to solve this problem?

2013-08-30 Thread Stephen Sprague
so this is not particular to Hive is it? You could post this on a DB2,
Oracle, or even Stackflow board i'd imagine.


On Fri, Aug 30, 2013 at 4:34 PM, qiaoresearcher qiaoresearc...@gmail.comwrote:


 I have three tables:

 Table 1: record when and who visited gas station or not, this contains all
 the users of interest, name all the users as a set A
 date |  user name|   visited gas station?
 2013-09-01 tom yes
 2013-09-02 tom yes
 2013-09-01 hanks  yes
 2013-09-03 tomyyes
    ...
...

 Table 2: record when and who visited Bestbuy,   the user in set A appear
 here, but not all users of A will appear in this table, also table 2 has
 users does not belong to set A
 date |  user name|   visited Bestbuy?
 2013-09-01 tom yes
 2013-09-02 jacob   yes
 2013-09-01 hanks  yes
 2013-09-03 michael   yes
    ...
...

 Table 3: record when and who arrives one of three destinations: CA, NY and
 DC, the users in table 3 has similar situation as users in table 2
 regarding set A.
 date |  user name|   visited Bestbuy or
 not
 2013-09-01 tom CA
 2013-09-02 tom NY
 2013-09-01 hanks  DC
 2013-09-03 tomyCA
    ...
...

 Now we want to know,  within a 90 days period, what are the following
 numbers:
 (1) for any given day, for the users in table 1, how many of them has a
 path like:   visited Gas station first, after that date, user went to
 Bestbuy, finally arrive CA
 (2) for any given day, for the users in table 1, how many of them has a
 path like:   visited Gas station first, then did not visit Bestbuy, finally
 arrive CA

 (3) for any given day, for the users in table 1, how many of them has a
 path like:   visited Gas station first, then went Bestbuy, finally arrive NY
 (4) for any given day, for the users in table 1, how many of them has a
 path like:   visited Gas station first, then did not visit Bestbuy,
 finally arrive NY

 (5) for any given day, for the users in table 1, how many of them has a
 path like:   visited Gas station first, then went Bestbuy, finally arrive DC
 (6) for any given day, for the users in table 1, how many of them has a
 path like:   visited Gas station first, then did not visit Bestbuy,
 finally arrive DC

 It is possible in a given day, a user may visit Bestbuy many times for
 table 2, this will be simply regarded as a status 'visited Bestbuy'. One
 user will be counted once in a given day.
  From the day of user visiting gas station, within next 90 days, this user
 has to arrive one of three final destinations in table 3, and arrives only
 one station. no multiple arrived stations.

 For example,
 a user tom may visit gas station on 9/1, then went to Bestbuy on 9/5,
 finally went to CA on 9/30, then this user tom can be counted as 1 on 9/1
 for path (1).
 a user hanks may visit gas station on 9/1, then went to Bestbuy on 9/5 and
 on 9/8, finally went to CA on 9/30, then this user tom count as 1 on 9/1
 for path (1).
 a user ruby may visit gas station on 9/1, and tables 2 does not have
 records to show ruby visit Bestbuy till 11/30, and ruby arrives DC in table
 3 before 11/30, then ruby contribute 1 for path (6) for the day 9/1.

 How to write Hive query to get those numbers for the six paths?
 a sample output will be :
 9/1: 100 for path (1), 90 for path (2), , etc
 9/2: 60 for path (1), 80 for path (2), , etc
 9/3: 

 Any suggestions or suggested reference/readings will be deeply
 appreciated.

 Thanks!
 Qiao




Hive Statistics information

2013-08-30 Thread Sanjay Subramanian
Hi guys

I have configured Hive to use MySQL for all statistics

hive.stats.atomic=false
hive.stats.autogather=true
hive.stats.collect.rawdatasize=true
hive.stats.dbclass=jdbc:mysql
hive.stats.dbconnectionstring=jdbc:mysql://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1
hive.stats.jdbc.timeout=30
hive.stats.jdbcdriver=com.mysql.jdbc.Driver
hive.stats.retries.max=0
hive.stats.retries.wait=3000

However in the MYSQL hive statistics tables , they don't seem to have any data ?

Where does Hive store the statistics information ?

sanjay

CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.


Re: Hive Statistics information

2013-08-30 Thread Ravi Kiran
Hi Sanjay,

   What do the logs say when you fire the ANALYZE TABLE...   statement on a
table ?
   One minor correction to the db connectionstring would be to use amp;
for the query parameters.
hive.stats.dbconnectionstring=jdbc:mysql://
v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1amp;password=hive_user_vso1http://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1

I hope the database
hive_vso1_tempstatsstorehttp://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1exists
in your MySQL?

Regards
Ravi Magham


On Sat, Aug 31, 2013 at 6:15 AM, Sanjay Subramanian 
sanjay.subraman...@wizecommerce.com wrote:

  Hi guys

  I have configured Hive to use MySQL for all statistics

  hive.stats.atomic=false
 hive.stats.autogather=true
 hive.stats.collect.rawdatasize=true
 hive.stats.dbclass=jdbc:mysql
 hive.stats.dbconnectionstring=jdbc:mysql://
 v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1
 hive.stats.jdbc.timeout=30
 hive.stats.jdbcdriver=com.mysql.jdbc.Driver
 hive.stats.retries.max=0
 hive.stats.retries.wait=3000

  However in the MYSQL hive statistics tables , they don't seem to have
 any data ?

  Where does Hive store the statistics information ?

  sanjay

 CONFIDENTIALITY NOTICE
 ==
 This email message and any attachments are for the exclusive use of the
 intended recipient(s) and may contain confidential and privileged
 information. Any unauthorized review, use, disclosure or distribution is
 prohibited. If you are not the intended recipient, please contact the
 sender by reply email and destroy all copies of the original message along
 with any attachments, from your computer system. If you are the intended
 recipient, please be advised that the content of this message is subject to
 access, review and disclosure by the sender's Email System Administrator.