Re: Problème with min function in HiveQL
Ok, thanks for this solution. Unfortunately, i have rewrited my script like this : INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM SELECT '${hiveconf:in_co_societe}' as co_societe, '${hiveconf:in_co_an_semaine}'as co_an_semaine, a.type_entite as type_entite, a.code_entite as code_entite, a.type_rgrp_produits as type_rgrp_produits, a.co_rgrp_produitsas co_rgrp_produits, SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, SUM(a.NB_CLIENTS) as NB_CLIENTS, SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP, SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP from default.THM_CA_RGRP_PRODUITS_JOUR a JOIN default.CALENDRIER b -- A verifier WHERE a.co_societe = '${hiveconf:in_co_societe}' AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG') GROUP BY a.type_entite, a.code_entite, a.type_rgrp_produits, a.co_rgrp_produits; HAVING a.dt_jour = MIN(b.dt_jour) AND a.dt_jour MIN(b.dt_jour)+1 And i have a new error : FAILED: ParseException line 2:6 cannot recognize input near 'HAVING' 'a' '.' My Hive version is 0.11 and HAVING function have been implemented in 0.7 so it would work fine. 2013/8/29 Stephen Sprague sprag...@gmail.com indeed. you nailed it. On Thu, Aug 29, 2013 at 11:53 AM, John Meagher john.meag...@gmail.comwrote: Aggregate functions need to go in a HAVING clause instead of the WHERE clause. WHERE clauses are applied prior to aggregation, HAVING is applied post aggregation. select ... from ... where some row level filter group by ... having some aggregate level filter On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com wrote: Looks like the issue is the use of min() within the WHERE clause - the place where the exception is being thrown has the following comment: // UDAF in filter condition, group-by caluse, param of funtion, etc. On Aug 29, 2013, at 3:01 AM, Jérôme Verdier verdier.jerom...@gmail.com wrote: Hi everybody, I am coding some HiveQL script to do some calculations. I have a problem with the min() function. My hive script is below : INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM SELECT '${hiveconf:in_co_societe}' as co_societe, '${hiveconf:in_co_an_semaine}'as co_an_semaine, a.type_entite as type_entite, a.code_entite as code_entite, a.type_rgrp_produits as type_rgrp_produits, a.co_rgrp_produitsas co_rgrp_produits, SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, SUM(a.NB_CLIENTS) as NB_CLIENTS, SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP, SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP from default.THM_CA_RGRP_PRODUITS_JOUR a JOIN default.CALENDRIER b -- A verifier WHERE CAST(a.dt_jour AS TIMESTAMP) = CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT) AND CAST(a.dt_jour AS TIMESTAMP) CAST(min(b.dt_jour)+1 AS TIMESTAMP) AND a.co_societe = '${hiveconf:in_co_societe}' AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG') GROUP BY a.type_entite, a.code_entite, a.type_rgrp_produits, a.co_rgrp_produits; And, when i try to launch this, i get this error : FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported place for UDAF 'min' Obviously, there is a problem with min() function. How can i pass through this error? Thanks for your help CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- *Jérôme VERDIER* 06.72.19.17.31
Re: Problème with min function in HiveQL
awright Jerome. look closely at the error message. you can figure this one out. On Fri, Aug 30, 2013 at 1:17 AM, Jérôme Verdier verdier.jerom...@gmail.comwrote: Ok, thanks for this solution. Unfortunately, i have rewrited my script like this : INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM SELECT '${hiveconf:in_co_societe}' as co_societe, '${hiveconf:in_co_an_semaine}'as co_an_semaine, a.type_entite as type_entite, a.code_entite as code_entite, a.type_rgrp_produits as type_rgrp_produits, a.co_rgrp_produitsas co_rgrp_produits, SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, SUM(a.NB_CLIENTS) as NB_CLIENTS, SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP, SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP from default.THM_CA_RGRP_PRODUITS_JOUR a JOIN default.CALENDRIER b -- A verifier WHERE a.co_societe = '${hiveconf:in_co_societe}' AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG') GROUP BY a.type_entite, a.code_entite, a.type_rgrp_produits, a.co_rgrp_produits; HAVING a.dt_jour = MIN(b.dt_jour) AND a.dt_jour MIN(b.dt_jour)+1 And i have a new error : FAILED: ParseException line 2:6 cannot recognize input near 'HAVING' 'a' '.' My Hive version is 0.11 and HAVING function have been implemented in 0.7 so it would work fine. 2013/8/29 Stephen Sprague sprag...@gmail.com indeed. you nailed it. On Thu, Aug 29, 2013 at 11:53 AM, John Meagher john.meag...@gmail.comwrote: Aggregate functions need to go in a HAVING clause instead of the WHERE clause. WHERE clauses are applied prior to aggregation, HAVING is applied post aggregation. select ... from ... where some row level filter group by ... having some aggregate level filter On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com wrote: Looks like the issue is the use of min() within the WHERE clause - the place where the exception is being thrown has the following comment: // UDAF in filter condition, group-by caluse, param of funtion, etc. On Aug 29, 2013, at 3:01 AM, Jérôme Verdier verdier.jerom...@gmail.com wrote: Hi everybody, I am coding some HiveQL script to do some calculations. I have a problem with the min() function. My hive script is below : INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM SELECT '${hiveconf:in_co_societe}' as co_societe, '${hiveconf:in_co_an_semaine}'as co_an_semaine, a.type_entite as type_entite, a.code_entite as code_entite, a.type_rgrp_produits as type_rgrp_produits, a.co_rgrp_produitsas co_rgrp_produits, SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, SUM(a.NB_CLIENTS) as NB_CLIENTS, SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP, SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP from default.THM_CA_RGRP_PRODUITS_JOUR a JOIN default.CALENDRIER b -- A verifier WHERE CAST(a.dt_jour AS TIMESTAMP) = CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT) AND CAST(a.dt_jour AS TIMESTAMP) CAST(min(b.dt_jour)+1 AS TIMESTAMP) AND a.co_societe = '${hiveconf:in_co_societe}' AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG') GROUP BY a.type_entite, a.code_entite, a.type_rgrp_produits, a.co_rgrp_produits; And, when i try to launch this, i get this error : FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported place for UDAF 'min' Obviously, there is a problem with min() function. How can i pass through this error? Thanks for your help CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is
Re: Problème with min function in HiveQL
Thanks Stephen, Yes i realise that it was so a stupid questionMaybe i wasn't really awaked this morning ;-) now it's working well. Thanks everyone. 2013/8/30 Stephen Sprague sprag...@gmail.com awright Jerome. look closely at the error message. you can figure this one out. On Fri, Aug 30, 2013 at 1:17 AM, Jérôme Verdier verdier.jerom...@gmail.com wrote: Ok, thanks for this solution. Unfortunately, i have rewrited my script like this : INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM SELECT '${hiveconf:in_co_societe}' as co_societe, '${hiveconf:in_co_an_semaine}'as co_an_semaine, a.type_entite as type_entite, a.code_entite as code_entite, a.type_rgrp_produits as type_rgrp_produits, a.co_rgrp_produitsas co_rgrp_produits, SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, SUM(a.NB_CLIENTS) as NB_CLIENTS, SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP, SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP from default.THM_CA_RGRP_PRODUITS_JOUR a JOIN default.CALENDRIER b -- A verifier WHERE a.co_societe = '${hiveconf:in_co_societe}' AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG') GROUP BY a.type_entite, a.code_entite, a.type_rgrp_produits, a.co_rgrp_produits; HAVING a.dt_jour = MIN(b.dt_jour) AND a.dt_jour MIN(b.dt_jour)+1 And i have a new error : FAILED: ParseException line 2:6 cannot recognize input near 'HAVING' 'a' '.' My Hive version is 0.11 and HAVING function have been implemented in 0.7 so it would work fine. 2013/8/29 Stephen Sprague sprag...@gmail.com indeed. you nailed it. On Thu, Aug 29, 2013 at 11:53 AM, John Meagher john.meag...@gmail.comwrote: Aggregate functions need to go in a HAVING clause instead of the WHERE clause. WHERE clauses are applied prior to aggregation, HAVING is applied post aggregation. select ... from ... where some row level filter group by ... having some aggregate level filter On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com wrote: Looks like the issue is the use of min() within the WHERE clause - the place where the exception is being thrown has the following comment: // UDAF in filter condition, group-by caluse, param of funtion, etc. On Aug 29, 2013, at 3:01 AM, Jérôme Verdier verdier.jerom...@gmail.com wrote: Hi everybody, I am coding some HiveQL script to do some calculations. I have a problem with the min() function. My hive script is below : INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM SELECT '${hiveconf:in_co_societe}' as co_societe, '${hiveconf:in_co_an_semaine}'as co_an_semaine, a.type_entite as type_entite, a.code_entite as code_entite, a.type_rgrp_produits as type_rgrp_produits, a.co_rgrp_produitsas co_rgrp_produits, SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, SUM(a.NB_CLIENTS) as NB_CLIENTS, SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP, SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP from default.THM_CA_RGRP_PRODUITS_JOUR a JOIN default.CALENDRIER b -- A verifier WHERE CAST(a.dt_jour AS TIMESTAMP) = CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT) AND CAST(a.dt_jour AS TIMESTAMP) CAST(min(b.dt_jour)+1 AS TIMESTAMP) AND a.co_societe = '${hiveconf:in_co_societe}' AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG') GROUP BY a.type_entite, a.code_entite, a.type_rgrp_produits, a.co_rgrp_produits; And, when i try to launch this, i get this error : FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported place for UDAF 'min' Obviously, there is a problem with min() function. How can i pass through this error? Thanks for your help CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under
Re: Problème with min function in HiveQL
Hi, Is it possible to have multiple conditions in Having clause ? I get an error when i'm trying this. Thanks a lot. 2013/8/30 Jérôme Verdier verdier.jerom...@gmail.com Thanks Stephen, Yes i realise that it was so a stupid questionMaybe i wasn't really awaked this morning ;-) now it's working well. Thanks everyone. 2013/8/30 Stephen Sprague sprag...@gmail.com awright Jerome. look closely at the error message. you can figure this one out. On Fri, Aug 30, 2013 at 1:17 AM, Jérôme Verdier verdier.jerom...@gmail.com wrote: Ok, thanks for this solution. Unfortunately, i have rewrited my script like this : INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM SELECT '${hiveconf:in_co_societe}' as co_societe, '${hiveconf:in_co_an_semaine}'as co_an_semaine, a.type_entite as type_entite, a.code_entite as code_entite, a.type_rgrp_produits as type_rgrp_produits, a.co_rgrp_produitsas co_rgrp_produits, SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, SUM(a.NB_CLIENTS) as NB_CLIENTS, SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP, SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP from default.THM_CA_RGRP_PRODUITS_JOUR a JOIN default.CALENDRIER b -- A verifier WHERE a.co_societe = '${hiveconf:in_co_societe}' AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG') GROUP BY a.type_entite, a.code_entite, a.type_rgrp_produits, a.co_rgrp_produits; HAVING a.dt_jour = MIN(b.dt_jour) AND a.dt_jour MIN(b.dt_jour)+1 And i have a new error : FAILED: ParseException line 2:6 cannot recognize input near 'HAVING' 'a' '.' My Hive version is 0.11 and HAVING function have been implemented in 0.7 so it would work fine. 2013/8/29 Stephen Sprague sprag...@gmail.com indeed. you nailed it. On Thu, Aug 29, 2013 at 11:53 AM, John Meagher john.meag...@gmail.comwrote: Aggregate functions need to go in a HAVING clause instead of the WHERE clause. WHERE clauses are applied prior to aggregation, HAVING is applied post aggregation. select ... from ... where some row level filter group by ... having some aggregate level filter On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com wrote: Looks like the issue is the use of min() within the WHERE clause - the place where the exception is being thrown has the following comment: // UDAF in filter condition, group-by caluse, param of funtion, etc. On Aug 29, 2013, at 3:01 AM, Jérôme Verdier verdier.jerom...@gmail.com wrote: Hi everybody, I am coding some HiveQL script to do some calculations. I have a problem with the min() function. My hive script is below : INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM SELECT '${hiveconf:in_co_societe}' as co_societe, '${hiveconf:in_co_an_semaine}'as co_an_semaine, a.type_entite as type_entite, a.code_entite as code_entite, a.type_rgrp_produits as type_rgrp_produits, a.co_rgrp_produitsas co_rgrp_produits, SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, SUM(a.NB_CLIENTS) as NB_CLIENTS, SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP, SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP from default.THM_CA_RGRP_PRODUITS_JOUR a JOIN default.CALENDRIER b -- A verifier WHERE CAST(a.dt_jour AS TIMESTAMP) = CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT) AND CAST(a.dt_jour AS TIMESTAMP) CAST(min(b.dt_jour)+1 AS TIMESTAMP) AND a.co_societe = '${hiveconf:in_co_societe}' AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG') GROUP BY a.type_entite, a.code_entite, a.type_rgrp_produits, a.co_rgrp_produits; And, when i try to launch this, i get this error : FAILED: SemanticException [Error 10128]: Line 20:62 Not yet supported place for UDAF 'min' Obviously, there is a problem with min() function. How can i pass through this error? Thanks for your help CONFIDENTIALITY NOTICE NOTICE: This message
Mappers per job per user
Hi All- I was hoping to gather some insight in how the hadoop (and or hive) job scheduler distributes mappers per user. I am running into an issue where I see that hadoop (and or hive) is evenly distributing mappers per user instead of per job. For example: -We have 1000 mapper capacity -10 Jobs are running total under User A -Each job is using 100 mappers -User B starts a job -User B's job is allocated 250 mappers -User A's jobs decrease to 75 mappers each instead of 100 What could be causing this allocation to occur by user and job, instead of just by job. I reviewed the hive/hadoop documentation and wasn't able to find any references to distributing jobs by user. All of the jobs are being executed within the hive shell, or using the hive command. Thanks, Brad
Pseudo column for the entire Line/Row ?
I am writing a UDF that will perform validation on the input row and shall require access to every column in the row (or alternatively to simply the unparsed/pre-processed line). Is there any way to achieve this? Or will it be simply necessary to declare an overloaded evaluate() method with a signature comprising every field ? Suggestion of alternative tactics also welcomed. thanks stephenb
Re: Mappers per job per user
and you can find that by looking at your mapred-site.xml . look for this property mapred-site.xml: namemapred.jobtracker.taskScheduler/name you should be using the default FIFO or valueorg.apache.hadoop.mapred.FairScheduler/value or valueorg.apache.hadoop.mapred.CapacityTaskScheduler/value Depends on what scheduler you are using , please read thru the following link to understand how the scheduler works ( like allocation , preemption etc ) . http://hadoop.apache.org/docs/stable/capacity_scheduler.html http://hadoop.apache.org/docs/stable/fair_scheduler.html -- Forwarded message -- From: Ravi Kiran maghamraviki...@gmail.com Date: Fri, Aug 30, 2013 at 10:30 AM Subject: Re: Mappers per job per user To: user@hive.apache.org Hi Brad, I believe you have configured a Capacity Scheduler for scheduling the jobs rather than the default FIFO scheduler. Regards Ravi Magham On Fri, Aug 30, 2013 at 10:07 PM, Brad Ruderman bruder...@radiumone.comwrote: Hi All- I was hoping to gather some insight in how the hadoop (and or hive) job scheduler distributes mappers per user. I am running into an issue where I see that hadoop (and or hive) is evenly distributing mappers per user instead of per job. For example: -We have 1000 mapper capacity -10 Jobs are running total under User A -Each job is using 100 mappers -User B starts a job -User B's job is allocated 250 mappers -User A's jobs decrease to 75 mappers each instead of 100 What could be causing this allocation to occur by user and job, instead of just by job. I reviewed the hive/hadoop documentation and wasn't able to find any references to distributing jobs by user. All of the jobs are being executed within the hive shell, or using the hive command. Thanks, Brad
Re: Mappers per job per user
Hi Kumar and Ravi- Thanks for your quick responses. Although I don't believe changing anything from default, it is possible because we are using a distribution. I will check. Thanks! Brad On Fri, Aug 30, 2013 at 1:46 PM, kumar y ykk1...@gmail.com wrote: and you can find that by looking at your mapred-site.xml . look for this property mapred-site.xml: namemapred.jobtracker.taskScheduler/name you should be using the default FIFO or valueorg.apache.hadoop.mapred.FairScheduler/value or valueorg.apache.hadoop.mapred.CapacityTaskScheduler/value Depends on what scheduler you are using , please read thru the following link to understand how the scheduler works ( like allocation , preemption etc ) . http://hadoop.apache.org/docs/stable/capacity_scheduler.html http://hadoop.apache.org/docs/stable/fair_scheduler.html -- Forwarded message -- From: Ravi Kiran maghamraviki...@gmail.com Date: Fri, Aug 30, 2013 at 10:30 AM Subject: Re: Mappers per job per user To: user@hive.apache.org Hi Brad, I believe you have configured a Capacity Scheduler for scheduling the jobs rather than the default FIFO scheduler. Regards Ravi Magham On Fri, Aug 30, 2013 at 10:07 PM, Brad Ruderman bruder...@radiumone.comwrote: Hi All- I was hoping to gather some insight in how the hadoop (and or hive) job scheduler distributes mappers per user. I am running into an issue where I see that hadoop (and or hive) is evenly distributing mappers per user instead of per job. For example: -We have 1000 mapper capacity -10 Jobs are running total under User A -Each job is using 100 mappers -User B starts a job -User B's job is allocated 250 mappers -User A's jobs decrease to 75 mappers each instead of 100 What could be causing this allocation to occur by user and job, instead of just by job. I reviewed the hive/hadoop documentation and wasn't able to find any references to distributing jobs by user. All of the jobs are being executed within the hive shell, or using the hive command. Thanks, Brad
Re: Problème with min function in HiveQL
there should be no limitation AFAIK. example + error message is always helpful - even if you might regret it. :) On Fri, Aug 30, 2013 at 8:16 AM, Jérôme Verdier verdier.jerom...@gmail.comwrote: Hi, Is it possible to have multiple conditions in Having clause ? I get an error when i'm trying this. Thanks a lot. 2013/8/30 Jérôme Verdier verdier.jerom...@gmail.com Thanks Stephen, Yes i realise that it was so a stupid questionMaybe i wasn't really awaked this morning ;-) now it's working well. Thanks everyone. 2013/8/30 Stephen Sprague sprag...@gmail.com awright Jerome. look closely at the error message. you can figure this one out. On Fri, Aug 30, 2013 at 1:17 AM, Jérôme Verdier verdier.jerom...@gmail.com wrote: Ok, thanks for this solution. Unfortunately, i have rewrited my script like this : INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM SELECT '${hiveconf:in_co_societe}' as co_societe, '${hiveconf:in_co_an_semaine}'as co_an_semaine, a.type_entite as type_entite, a.code_entite as code_entite, a.type_rgrp_produits as type_rgrp_produits, a.co_rgrp_produitsas co_rgrp_produits, SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, SUM(a.NB_CLIENTS) as NB_CLIENTS, SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP, SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP from default.THM_CA_RGRP_PRODUITS_JOUR a JOIN default.CALENDRIER b -- A verifier WHERE a.co_societe = '${hiveconf:in_co_societe}' AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG') GROUP BY a.type_entite, a.code_entite, a.type_rgrp_produits, a.co_rgrp_produits; HAVING a.dt_jour = MIN(b.dt_jour) AND a.dt_jour MIN(b.dt_jour)+1 And i have a new error : FAILED: ParseException line 2:6 cannot recognize input near 'HAVING' 'a' '.' My Hive version is 0.11 and HAVING function have been implemented in 0.7 so it would work fine. 2013/8/29 Stephen Sprague sprag...@gmail.com indeed. you nailed it. On Thu, Aug 29, 2013 at 11:53 AM, John Meagher john.meag...@gmail.com wrote: Aggregate functions need to go in a HAVING clause instead of the WHERE clause. WHERE clauses are applied prior to aggregation, HAVING is applied post aggregation. select ... from ... where some row level filter group by ... having some aggregate level filter On Thu, Aug 29, 2013 at 2:49 PM, Jason Dere jd...@hortonworks.com wrote: Looks like the issue is the use of min() within the WHERE clause - the place where the exception is being thrown has the following comment: // UDAF in filter condition, group-by caluse, param of funtion, etc. On Aug 29, 2013, at 3:01 AM, Jérôme Verdier verdier.jerom...@gmail.com wrote: Hi everybody, I am coding some HiveQL script to do some calculations. I have a problem with the min() function. My hive script is below : INSERT INTO TABLE default.THM_CA_RGRP_PRODUITS_SEM SELECT '${hiveconf:in_co_societe}' as co_societe, '${hiveconf:in_co_an_semaine}'as co_an_semaine, a.type_entite as type_entite, a.code_entite as code_entite, a.type_rgrp_produits as type_rgrp_produits, a.co_rgrp_produitsas co_rgrp_produits, SUM(a.MT_CA_NET_TTC) as MT_CA_NET_TTC, SUM(a.MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC, SUM(a.NB_CLIENTS) as NB_CLIENTS, SUM(a.MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP, SUM(a.MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP, SUM(a.NB_CLIENTS_COMP)as NB_CLIENTS_COMP from default.THM_CA_RGRP_PRODUITS_JOUR a JOIN default.CALENDRIER b -- A verifier WHERE CAST(a.dt_jour AS TIMESTAMP) = CAST(min(UNIX_TIMESTAMP(b.dt_jour)) AS BIGINT) AND CAST(a.dt_jour AS TIMESTAMP) CAST(min(b.dt_jour)+1 AS TIMESTAMP) AND a.co_societe = '${hiveconf:in_co_societe}' AND ('${hiveconf:flg_reprise_entite}' = 0 OR a.type_entite'MAG') GROUP BY a.type_entite, a.code_entite, a.type_rgrp_produits, a.co_rgrp_produits; And, when i try to launch this, i get this error : FAILED: SemanticException [Error 10128]: Line 20:62 Not yet
Re: how to write hive query to solve this problem?
so this is not particular to Hive is it? You could post this on a DB2, Oracle, or even Stackflow board i'd imagine. On Fri, Aug 30, 2013 at 4:34 PM, qiaoresearcher qiaoresearc...@gmail.comwrote: I have three tables: Table 1: record when and who visited gas station or not, this contains all the users of interest, name all the users as a set A date | user name| visited gas station? 2013-09-01 tom yes 2013-09-02 tom yes 2013-09-01 hanks yes 2013-09-03 tomyyes ... ... Table 2: record when and who visited Bestbuy, the user in set A appear here, but not all users of A will appear in this table, also table 2 has users does not belong to set A date | user name| visited Bestbuy? 2013-09-01 tom yes 2013-09-02 jacob yes 2013-09-01 hanks yes 2013-09-03 michael yes ... ... Table 3: record when and who arrives one of three destinations: CA, NY and DC, the users in table 3 has similar situation as users in table 2 regarding set A. date | user name| visited Bestbuy or not 2013-09-01 tom CA 2013-09-02 tom NY 2013-09-01 hanks DC 2013-09-03 tomyCA ... ... Now we want to know, within a 90 days period, what are the following numbers: (1) for any given day, for the users in table 1, how many of them has a path like: visited Gas station first, after that date, user went to Bestbuy, finally arrive CA (2) for any given day, for the users in table 1, how many of them has a path like: visited Gas station first, then did not visit Bestbuy, finally arrive CA (3) for any given day, for the users in table 1, how many of them has a path like: visited Gas station first, then went Bestbuy, finally arrive NY (4) for any given day, for the users in table 1, how many of them has a path like: visited Gas station first, then did not visit Bestbuy, finally arrive NY (5) for any given day, for the users in table 1, how many of them has a path like: visited Gas station first, then went Bestbuy, finally arrive DC (6) for any given day, for the users in table 1, how many of them has a path like: visited Gas station first, then did not visit Bestbuy, finally arrive DC It is possible in a given day, a user may visit Bestbuy many times for table 2, this will be simply regarded as a status 'visited Bestbuy'. One user will be counted once in a given day. From the day of user visiting gas station, within next 90 days, this user has to arrive one of three final destinations in table 3, and arrives only one station. no multiple arrived stations. For example, a user tom may visit gas station on 9/1, then went to Bestbuy on 9/5, finally went to CA on 9/30, then this user tom can be counted as 1 on 9/1 for path (1). a user hanks may visit gas station on 9/1, then went to Bestbuy on 9/5 and on 9/8, finally went to CA on 9/30, then this user tom count as 1 on 9/1 for path (1). a user ruby may visit gas station on 9/1, and tables 2 does not have records to show ruby visit Bestbuy till 11/30, and ruby arrives DC in table 3 before 11/30, then ruby contribute 1 for path (6) for the day 9/1. How to write Hive query to get those numbers for the six paths? a sample output will be : 9/1: 100 for path (1), 90 for path (2), , etc 9/2: 60 for path (1), 80 for path (2), , etc 9/3: Any suggestions or suggested reference/readings will be deeply appreciated. Thanks! Qiao
Hive Statistics information
Hi guys I have configured Hive to use MySQL for all statistics hive.stats.atomic=false hive.stats.autogather=true hive.stats.collect.rawdatasize=true hive.stats.dbclass=jdbc:mysql hive.stats.dbconnectionstring=jdbc:mysql://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1 hive.stats.jdbc.timeout=30 hive.stats.jdbcdriver=com.mysql.jdbc.Driver hive.stats.retries.max=0 hive.stats.retries.wait=3000 However in the MYSQL hive statistics tables , they don't seem to have any data ? Where does Hive store the statistics information ? sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Re: Hive Statistics information
Hi Sanjay, What do the logs say when you fire the ANALYZE TABLE... statement on a table ? One minor correction to the db connectionstring would be to use amp; for the query parameters. hive.stats.dbconnectionstring=jdbc:mysql:// v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1amp;password=hive_user_vso1http://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1 I hope the database hive_vso1_tempstatsstorehttp://v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1exists in your MySQL? Regards Ravi Magham On Sat, Aug 31, 2013 at 6:15 AM, Sanjay Subramanian sanjay.subraman...@wizecommerce.com wrote: Hi guys I have configured Hive to use MySQL for all statistics hive.stats.atomic=false hive.stats.autogather=true hive.stats.collect.rawdatasize=true hive.stats.dbclass=jdbc:mysql hive.stats.dbconnectionstring=jdbc:mysql:// v-so1.nextagqa.com/hive_vso1_tempstatsstore?user=hive_user_vso1password=hive_user_vso1 hive.stats.jdbc.timeout=30 hive.stats.jdbcdriver=com.mysql.jdbc.Driver hive.stats.retries.max=0 hive.stats.retries.wait=3000 However in the MYSQL hive statistics tables , they don't seem to have any data ? Where does Hive store the statistics information ? sanjay CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.