Re: Need urgent help in hive query
sure will check to that Robin -Thanks On Sat, Jun 29, 2013 at 1:11 AM, Robin Morris wrote: > It probably won't make a difference to performance — the subqueries will > be executed in separate map-reduce jobs, and the output put on disk anyway. > > > First, get it to work; second, optimize. > > Robin > > From: dyuti a > Reply-To: "user@hive.apache.org" > Date: Friday, June 28, 2013 12:05 PM > To: "user@hive.apache.org" > Subject: Re: Need urgent help in hive query > > Hi Robin, > Thanks for your reply. Hope this will help us but still looking into > performance wise tooif no other option then have to go by this. > > Dti > > > On Sat, Jun 29, 2013 at 12:03 AM, Robin Morris wrote: > >> Split up the query – put results of the subquery into a table, run the >> final query on that, then drop the temporary table. >> >> Robin >> >> From: dyuti a >> Reply-To: "user@hive.apache.org" >> Date: Friday, June 28, 2013 10:54 AM >> To: "user@hive.apache.org" , Michael Malak < >> michaelma...@yahoo.com> >> Subject: Re: Fwd: Need urgent help in hive query >> >> Hi Michael, >> Thanks for your help, is there any other possible options apart from this. >> >> >> On Fri, Jun 28, 2013 at 10:33 PM, Michael Malak >> wrote: >> >>> Just copy and paste the whole long expressions to their second >>> occurrences. >>> >>> -- >>> *From:* dyuti a >>> *To:* user@hive.apache.org >>> *Sent:* Friday, June 28, 2013 10:58 AM >>> *Subject:* Fwd: Need urgent help in hive query >>> >>> Hi Experts, >>> I'm trying with the below SQL query in Hive, which does not support >>> column alias access in subsequent columns as shown below in the query. Is >>> there any other way to rewrite the same without using alias? any of your >>> help are really appreciated. >>> >>> INSERT INTO CAUD >>>( >>> pst_cc >>> pst_no >>> pst_ber >>> pst_tkn >>> pst_dtm >>> pst_nbr >>> pst_cde >>> pst_dte >>> ) >>> SELECT der.cc >>> der.no >>> der.ber >>> der.tkn >>> der.dtm >>> der.nbr >>> der.cde >>> der.dte >>> >>> FROM (SELECT udp.cc >>>udp.no >>>udp.ber >>>udp.tkn >>>,CASE WHEN udp.SYSTEM_DTE>1600 AND udp.SYSTEM_DTE<1 THEN >>> udp.SYSTEM_DTE >>> WHEN udp.DTE_OUT>1600 AND udp.DTE_OUT<1 THEN >>> udp.DTE_OUT >>> WHEN udp.DTE_IN>1600 AND udp.DTE_IN<1 THEN >>> udp.DTE_IN >>> ELSE '1231' >>> END AS DTE_OUT >>>,CASE WHEN udp.TME_OUT <> 0 THEN udp.TME_OUT >>> WHEN udp.TME_IN <> 0 THEN udp.TME_IN >>> ELSE 0 >>> END AS TME_OUT >>> ,TRIM(CAST(TME_OUT AS CHAR(6))) AS TME_OUT1 >>> ,CAST(CAST(SUBSTR(TRIM(DTE_OUT),1,8) AS CHAR(8)) AS DATE FORMAT >>> 'mmdd') AS DTE_OUT_O >>> ,CASE WHEN TME_OUT>9 THEN CAST(TME_OUT1 AS CHAR(6)) >>> WHEN TME_OUT> AND TME_OUT<=9 THEN >>> CAST('0'||TME_OUT1 AS CHAR(6)) >>> WHEN TME_OUT>999 AND TME_OUT<= THEN CAST('00'||TME_OUT1 >>> AS CHAR(6)) >>> WHEN TME_OUT>99 AND TME_OUT<=999 THEN CAST('000'||TME_OUT1 >>> AS CHAR(6)) >>> WHEN TME_OUT>9 AND TME_OUT<=99 THEN CAST(''||TME_OUT1 AS >>> CHAR(6)) >>> WHEN TME_OUT>0 AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 AS >>> CHAR(6)) >>> WHEN TME_OUT=0 THEN '00' >>> END AS TME_OUT2 >>> >>> ,SUBSTR(TRIM(TME_OUT2),1,2)||':'||SUBSTR(TRIM(TME_OUT2),3,2)||':'||SUBSTR(TRIM(TME_OUT2),5,2) >>> AS >>> TME_OUT_O >>> , CAST( DTE_OUT_O||' '||TME_OUT_O AS TIMESTAMP FORMAT >>> 'MMDD:HH: MI:SS') AS DTM >>> ,udp.nbr AS nbr >>> >>>FROM STS_GNCAUDP udp >>> >>> INNER JOIN LOAD_LOG LZ_LL ON udp.LOG_KEY=LZ_LL.LOG_KEY >>> INNER JOIN ESA_LOAD_LOG ESA_LL ON >>> LZ_LL.ESA_LOAD_LOG_KEY=ESA_LL.LOG_KEY >>> AND ESA_LL.PBLSH_IND='$PBLSH_IND' >>> AND ESA_LL.LOAD_END_DTM ='$HIGH_DATE_TIME' >>> AND ESA_LL.SOR_CD= '$CLM_SOR_CD' >>> AND ESA_LL.SUBJ_AREA_NM= '$SUBJ_AREA_NM' >>> AND ESA_LL.WORK_FLOW_NM= '$WORK_FLOW_NM' >>> QUALIFY ROW_NUMBER() OVER (PARTITION BY udp.cc,udp.pst_no, >>> udp.cde,udp.nbr,udp.dte,udp.LOG_KEY >>> ORDER BY DTM DESC)=1) AS der >>> ; >>> >>> >>> >>> Thanks in advance! >>> Dti >>> >>> >>> >>> >>> >> >
Re: Need urgent help in hive query
It probably won't make a difference to performance — the subqueries will be executed in separate map-reduce jobs, and the output put on disk anyway. First, get it to work; second, optimize. Robin From: dyuti a mailto:hadoop.hiv...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Date: Friday, June 28, 2013 12:05 PM To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Subject: Re: Need urgent help in hive query Hi Robin, Thanks for your reply. Hope this will help us but still looking into performance wise tooif no other option then have to go by this. Dti On Sat, Jun 29, 2013 at 12:03 AM, Robin Morris mailto:r...@baynote.com>> wrote: Split up the query – put results of the subquery into a table, run the final query on that, then drop the temporary table. Robin From: dyuti a mailto:hadoop.hiv...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Date: Friday, June 28, 2013 10:54 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>>, Michael Malak mailto:michaelma...@yahoo.com>> Subject: Re: Fwd: Need urgent help in hive query Hi Michael, Thanks for your help, is there any other possible options apart from this. On Fri, Jun 28, 2013 at 10:33 PM, Michael Malak mailto:michaelma...@yahoo.com>> wrote: Just copy and paste the whole long expressions to their second occurrences. From: dyuti a mailto:hadoop.hiv...@gmail.com>> To: user@hive.apache.org<mailto:user@hive.apache.org> Sent: Friday, June 28, 2013 10:58 AM Subject: Fwd: Need urgent help in hive query Hi Experts, I'm trying with the below SQL query in Hive, which does not support column alias access in subsequent columns as shown below in the query. Is there any other way to rewrite the same without using alias? any of your help are really appreciated. INSERT INTO CAUD ( pst_cc pst_no pst_ber pst_tkn pst_dtm pst_nbr pst_cde pst_dte ) SELECT der.cc<http://der.cc/> der.no<http://der.no/> der.ber der.tkn der.dtm der.nbr der.cde der.dte FROM (SELECT udp.cc<http://udp.cc/> udp.no<http://udp.no/> udp.ber udp.tkn ,CASE WHEN udp.SYSTEM_DTE>1600 AND udp.SYSTEM_DTE<1 THEN udp.SYSTEM_DTE WHEN udp.DTE_OUT>1600 AND udp.DTE_OUT<1 THEN udp.DTE_OUT WHEN udp.DTE_IN>1600 AND udp.DTE_IN<1 THEN udp.DTE_IN ELSE '1231' END AS DTE_OUT ,CASE WHEN udp.TME_OUT <> 0 THEN udp.TME_OUT WHEN udp.TME_IN <> 0 THEN udp.TME_IN ELSE 0 END AS TME_OUT ,TRIM(CAST(TME_OUT AS CHAR(6))) AS TME_OUT1 ,CAST(CAST(SUBSTR(TRIM(DTE_OUT),1,8) AS CHAR(8)) AS DATE FORMAT 'mmdd') AS DTE_OUT_O ,CASE WHEN TME_OUT>9 THEN CAST(TME_OUT1 AS CHAR(6)) WHEN TME_OUT> AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>999 AND TME_OUT<= THEN CAST('00'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>99 AND TME_OUT<=999 THEN CAST('000'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>9 AND TME_OUT<=99 THEN CAST(''||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>0 AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT=0 THEN '00' END AS TME_OUT2 ,SUBSTR(TRIM(TME_OUT2),1,2)||':'||SUBSTR(TRIM(TME_OUT2),3,2)||':'||SUBSTR(TRIM(TME_OUT2),5,2) AS TME_OUT_O , CAST( DTE_OUT_O||' '||TME_OUT_O AS TIMESTAMP FORMAT 'MMDD:HH: MI:SS') AS DTM ,udp.nbr AS nbr FROM STS_GNCAUDP udp INNER JOIN LOAD_LOG LZ_LL ON udp.LOG_KEY=LZ_LL.LOG_KEY INNER JOIN ESA_LOAD_LOG ESA_LL ON LZ_LL.ESA_LOAD_LOG_KEY=ESA_LL.LOG_KEY AND ESA_LL.PBLSH_IND='$PBLSH_IND' AND ESA_LL.LOAD_END_DTM ='$HIGH_DATE_TIME' AND ESA_LL.SOR_CD= '$CLM_SOR_CD' AND ESA_LL.SUBJ_AREA_NM= '$SUBJ_AREA_NM' AND ESA_LL.WORK_FLOW_NM= '$WORK_FLOW_NM' QUALIFY ROW_NUMBER() OVER (PARTITION BY udp.cc,udp.pst_no, udp.cde,udp.nbr,udp.dte,udp.LOG_KEY ORDER BY DTM DESC)=1) AS der ; Thanks in advance! Dti
Re: Need urgent help in hive query
Hi Robin, Thanks for your reply. Hope this will help us but still looking into performance wise tooif no other option then have to go by this. Dti On Sat, Jun 29, 2013 at 12:03 AM, Robin Morris wrote: > Split up the query – put results of the subquery into a table, run the > final query on that, then drop the temporary table. > > Robin > > From: dyuti a > Reply-To: "user@hive.apache.org" > Date: Friday, June 28, 2013 10:54 AM > To: "user@hive.apache.org" , Michael Malak < > michaelma...@yahoo.com> > Subject: Re: Fwd: Need urgent help in hive query > > Hi Michael, > Thanks for your help, is there any other possible options apart from this. > > > On Fri, Jun 28, 2013 at 10:33 PM, Michael Malak wrote: > >> Just copy and paste the whole long expressions to their second >> occurrences. >> >> -- >> *From:* dyuti a >> *To:* user@hive.apache.org >> *Sent:* Friday, June 28, 2013 10:58 AM >> *Subject:* Fwd: Need urgent help in hive query >> >> Hi Experts, >> I'm trying with the below SQL query in Hive, which does not support >> column alias access in subsequent columns as shown below in the query. Is >> there any other way to rewrite the same without using alias? any of your >> help are really appreciated. >> >> INSERT INTO CAUD >>( >> pst_cc >> pst_no >> pst_ber >> pst_tkn >> pst_dtm >> pst_nbr >> pst_cde >> pst_dte >> ) >> SELECT der.cc >> der.no >> der.ber >> der.tkn >> der.dtm >> der.nbr >> der.cde >> der.dte >> >> FROM (SELECT udp.cc >>udp.no >>udp.ber >>udp.tkn >>,CASE WHEN udp.SYSTEM_DTE>1600 AND udp.SYSTEM_DTE<1 THEN >> udp.SYSTEM_DTE >> WHEN udp.DTE_OUT>1600 AND udp.DTE_OUT<1 THEN >> udp.DTE_OUT >> WHEN udp.DTE_IN>1600 AND udp.DTE_IN<1 THEN >> udp.DTE_IN >> ELSE '1231' >> END AS DTE_OUT >>,CASE WHEN udp.TME_OUT <> 0 THEN udp.TME_OUT >> WHEN udp.TME_IN <> 0 THEN udp.TME_IN >> ELSE 0 >> END AS TME_OUT >> ,TRIM(CAST(TME_OUT AS CHAR(6))) AS TME_OUT1 >> ,CAST(CAST(SUBSTR(TRIM(DTE_OUT),1,8) AS CHAR(8)) AS DATE FORMAT >> 'mmdd') AS DTE_OUT_O >> ,CASE WHEN TME_OUT>9 THEN CAST(TME_OUT1 AS CHAR(6)) >> WHEN TME_OUT> AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 >> AS CHAR(6)) >> WHEN TME_OUT>999 AND TME_OUT<= THEN CAST('00'||TME_OUT1 >> AS CHAR(6)) >> WHEN TME_OUT>99 AND TME_OUT<=999 THEN CAST('000'||TME_OUT1 AS >> CHAR(6)) >> WHEN TME_OUT>9 AND TME_OUT<=99 THEN CAST(''||TME_OUT1 AS >> CHAR(6)) >> WHEN TME_OUT>0 AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 AS >> CHAR(6)) >> WHEN TME_OUT=0 THEN '00' >> END AS TME_OUT2 >> >> ,SUBSTR(TRIM(TME_OUT2),1,2)||':'||SUBSTR(TRIM(TME_OUT2),3,2)||':'||SUBSTR(TRIM(TME_OUT2),5,2) >> AS >> TME_OUT_O >> , CAST( DTE_OUT_O||' '||TME_OUT_O AS TIMESTAMP FORMAT >> 'MMDD:HH: MI:SS') AS DTM >> ,udp.nbr AS nbr >> >>FROM STS_GNCAUDP udp >> >> INNER JOIN LOAD_LOG LZ_LL ON udp.LOG_KEY=LZ_LL.LOG_KEY >> INNER JOIN ESA_LOAD_LOG ESA_LL ON >> LZ_LL.ESA_LOAD_LOG_KEY=ESA_LL.LOG_KEY >> AND ESA_LL.PBLSH_IND='$PBLSH_IND' >> AND ESA_LL.LOAD_END_DTM ='$HIGH_DATE_TIME' >> AND ESA_LL.SOR_CD= '$CLM_SOR_CD' >> AND ESA_LL.SUBJ_AREA_NM= '$SUBJ_AREA_NM' >> AND ESA_LL.WORK_FLOW_NM= '$WORK_FLOW_NM' >> QUALIFY ROW_NUMBER() OVER (PARTITION BY udp.cc,udp.pst_no, >> udp.cde,udp.nbr,udp.dte,udp.LOG_KEY >> ORDER BY DTM DESC)=1) AS der >> ; >> >> >> >> Thanks in advance! >> Dti >> >> >> >> >> >
Re: Need urgent help in hive query
Split up the query – put results of the subquery into a table, run the final query on that, then drop the temporary table. Robin From: dyuti a mailto:hadoop.hiv...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Date: Friday, June 28, 2013 10:54 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>>, Michael Malak mailto:michaelma...@yahoo.com>> Subject: Re: Fwd: Need urgent help in hive query Hi Michael, Thanks for your help, is there any other possible options apart from this. On Fri, Jun 28, 2013 at 10:33 PM, Michael Malak mailto:michaelma...@yahoo.com>> wrote: Just copy and paste the whole long expressions to their second occurrences. From: dyuti a mailto:hadoop.hiv...@gmail.com>> To: user@hive.apache.org<mailto:user@hive.apache.org> Sent: Friday, June 28, 2013 10:58 AM Subject: Fwd: Need urgent help in hive query Hi Experts, I'm trying with the below SQL query in Hive, which does not support column alias access in subsequent columns as shown below in the query. Is there any other way to rewrite the same without using alias? any of your help are really appreciated. INSERT INTO CAUD ( pst_cc pst_no pst_ber pst_tkn pst_dtm pst_nbr pst_cde pst_dte ) SELECT der.cc<http://der.cc/> der.no<http://der.no/> der.ber der.tkn der.dtm der.nbr der.cde der.dte FROM (SELECT udp.cc<http://udp.cc/> udp.no<http://udp.no/> udp.ber udp.tkn ,CASE WHEN udp.SYSTEM_DTE>1600 AND udp.SYSTEM_DTE<1 THEN udp.SYSTEM_DTE WHEN udp.DTE_OUT>1600 AND udp.DTE_OUT<1 THEN udp.DTE_OUT WHEN udp.DTE_IN>1600 AND udp.DTE_IN<1 THEN udp.DTE_IN ELSE '1231' END AS DTE_OUT ,CASE WHEN udp.TME_OUT <> 0 THEN udp.TME_OUT WHEN udp.TME_IN <> 0 THEN udp.TME_IN ELSE 0 END AS TME_OUT ,TRIM(CAST(TME_OUT AS CHAR(6))) AS TME_OUT1 ,CAST(CAST(SUBSTR(TRIM(DTE_OUT),1,8) AS CHAR(8)) AS DATE FORMAT 'mmdd') AS DTE_OUT_O ,CASE WHEN TME_OUT>9 THEN CAST(TME_OUT1 AS CHAR(6)) WHEN TME_OUT> AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>999 AND TME_OUT<= THEN CAST('00'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>99 AND TME_OUT<=999 THEN CAST('000'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>9 AND TME_OUT<=99 THEN CAST(''||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>0 AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT=0 THEN '00' END AS TME_OUT2 ,SUBSTR(TRIM(TME_OUT2),1,2)||':'||SUBSTR(TRIM(TME_OUT2),3,2)||':'||SUBSTR(TRIM(TME_OUT2),5,2) AS TME_OUT_O , CAST( DTE_OUT_O||' '||TME_OUT_O AS TIMESTAMP FORMAT 'MMDD:HH: MI:SS') AS DTM ,udp.nbr AS nbr FROM STS_GNCAUDP udp INNER JOIN LOAD_LOG LZ_LL ON udp.LOG_KEY=LZ_LL.LOG_KEY INNER JOIN ESA_LOAD_LOG ESA_LL ON LZ_LL.ESA_LOAD_LOG_KEY=ESA_LL.LOG_KEY AND ESA_LL.PBLSH_IND='$PBLSH_IND' AND ESA_LL.LOAD_END_DTM ='$HIGH_DATE_TIME' AND ESA_LL.SOR_CD= '$CLM_SOR_CD' AND ESA_LL.SUBJ_AREA_NM= '$SUBJ_AREA_NM' AND ESA_LL.WORK_FLOW_NM= '$WORK_FLOW_NM' QUALIFY ROW_NUMBER() OVER (PARTITION BY udp.cc,udp.pst_no, udp.cde,udp.nbr,udp.dte,udp.LOG_KEY ORDER BY DTM DESC)=1) AS der ; Thanks in advance! Dti
Re: Fwd: Need urgent help in hive query
Hi Michael, Thanks for your help, is there any other possible options apart from this. On Fri, Jun 28, 2013 at 10:33 PM, Michael Malak wrote: > Just copy and paste the whole long expressions to their second occurrences. > > -- > *From:* dyuti a > *To:* user@hive.apache.org > *Sent:* Friday, June 28, 2013 10:58 AM > *Subject:* Fwd: Need urgent help in hive query > > Hi Experts, > I'm trying with the below SQL query in Hive, which does not support column > alias access in subsequent columns as shown below in the query. Is there > any other way to rewrite the same without using alias? any of your help are > really appreciated. > > INSERT INTO CAUD >( > pst_cc > pst_no > pst_ber > pst_tkn > pst_dtm > pst_nbr > pst_cde > pst_dte > ) > SELECT der.cc > der.no > der.ber > der.tkn > der.dtm > der.nbr > der.cde > der.dte > > FROM (SELECT udp.cc >udp.no >udp.ber >udp.tkn >,CASE WHEN udp.SYSTEM_DTE>1600 AND udp.SYSTEM_DTE<1 THEN > udp.SYSTEM_DTE > WHEN udp.DTE_OUT>1600 AND udp.DTE_OUT<1 THEN > udp.DTE_OUT > WHEN udp.DTE_IN>1600 AND udp.DTE_IN<1 THEN > udp.DTE_IN > ELSE '1231' > END AS DTE_OUT >,CASE WHEN udp.TME_OUT <> 0 THEN udp.TME_OUT > WHEN udp.TME_IN <> 0 THEN udp.TME_IN > ELSE 0 > END AS TME_OUT > ,TRIM(CAST(TME_OUT AS CHAR(6))) AS TME_OUT1 > ,CAST(CAST(SUBSTR(TRIM(DTE_OUT),1,8) AS CHAR(8)) AS DATE FORMAT > 'mmdd') AS DTE_OUT_O > ,CASE WHEN TME_OUT>9 THEN CAST(TME_OUT1 AS CHAR(6)) > WHEN TME_OUT> AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 > AS CHAR(6)) > WHEN TME_OUT>999 AND TME_OUT<= THEN CAST('00'||TME_OUT1 AS > CHAR(6)) > WHEN TME_OUT>99 AND TME_OUT<=999 THEN CAST('000'||TME_OUT1 AS > CHAR(6)) > WHEN TME_OUT>9 AND TME_OUT<=99 THEN CAST(''||TME_OUT1 AS > CHAR(6)) > WHEN TME_OUT>0 AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 AS > CHAR(6)) > WHEN TME_OUT=0 THEN '00' > END AS TME_OUT2 > > ,SUBSTR(TRIM(TME_OUT2),1,2)||':'||SUBSTR(TRIM(TME_OUT2),3,2)||':'||SUBSTR(TRIM(TME_OUT2),5,2) > AS > TME_OUT_O > , CAST( DTE_OUT_O||' '||TME_OUT_O AS TIMESTAMP FORMAT > 'MMDD:HH: MI:SS') AS DTM > ,udp.nbr AS nbr > >FROM STS_GNCAUDP udp > > INNER JOIN LOAD_LOG LZ_LL ON udp.LOG_KEY=LZ_LL.LOG_KEY > INNER JOIN ESA_LOAD_LOG ESA_LL ON > LZ_LL.ESA_LOAD_LOG_KEY=ESA_LL.LOG_KEY > AND ESA_LL.PBLSH_IND='$PBLSH_IND' > AND ESA_LL.LOAD_END_DTM ='$HIGH_DATE_TIME' > AND ESA_LL.SOR_CD= '$CLM_SOR_CD' > AND ESA_LL.SUBJ_AREA_NM= '$SUBJ_AREA_NM' > AND ESA_LL.WORK_FLOW_NM= '$WORK_FLOW_NM' > QUALIFY ROW_NUMBER() OVER (PARTITION BY udp.cc,udp.pst_no, > udp.cde,udp.nbr,udp.dte,udp.LOG_KEY > ORDER BY DTM DESC)=1) AS der > ; > > > > Thanks in advance! > Dti > > > > >
Re: Fwd: Need urgent help in hive query
Just copy and paste the whole long expressions to their second occurrences. From: dyuti a To: user@hive.apache.org Sent: Friday, June 28, 2013 10:58 AM Subject: Fwd: Need urgent help in hive query Hi Experts, I'm trying with the below SQL query in Hive, which does not support column alias access in subsequent columns as shown below in the query. Is there any other way to rewrite the same without using alias? any of your help are really appreciated. INSERT INTO CAUD ( pst_cc pst_no pst_ber pst_tkn pst_dtm pst_nbr pst_cde pst_dte ) SELECT der.cc der.no der.ber der.tkn der.dtm der.nbr der.cde der.dte FROM (SELECT udp.cc udp.no udp.ber udp.tkn ,CASE WHEN udp.SYSTEM_DTE>1600 AND udp.SYSTEM_DTE<1 THEN udp.SYSTEM_DTE WHEN udp.DTE_OUT>1600 AND udp.DTE_OUT<1 THEN udp.DTE_OUT WHEN udp.DTE_IN>1600 AND udp.DTE_IN<1 THEN udp.DTE_IN ELSE '1231' END AS DTE_OUT ,CASE WHEN udp.TME_OUT <> 0 THEN udp.TME_OUT WHEN udp.TME_IN <> 0 THEN udp.TME_IN ELSE 0 END AS TME_OUT ,TRIM(CAST(TME_OUT AS CHAR(6))) AS TME_OUT1 ,CAST(CAST(SUBSTR(TRIM(DTE_OUT),1,8) AS CHAR(8)) AS DATE FORMAT 'mmdd') AS DTE_OUT_O ,CASE WHEN TME_OUT>9 THEN CAST(TME_OUT1 AS CHAR(6)) WHEN TME_OUT> AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>999 AND TME_OUT<= THEN CAST('00'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>99 AND TME_OUT<=999 THEN CAST('000'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>9 AND TME_OUT<=99 THEN CAST(''||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>0 AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT=0 THEN '00' END AS TME_OUT2 ,SUBSTR(TRIM(TME_OUT2),1,2)||':'||SUBSTR(TRIM(TME_OUT2),3,2)||':'||SUBSTR(TRIM(TME_OUT2),5,2) AS TME_OUT_O , CAST( DTE_OUT_O||' '||TME_OUT_O AS TIMESTAMP FORMAT 'MMDD:HH: MI:SS') AS DTM ,udp.nbr AS nbr FROM STS_GNCAUDP udp INNER JOIN LOAD_LOG LZ_LL ON udp.LOG_KEY=LZ_LL.LOG_KEY INNER JOIN ESA_LOAD_LOG ESA_LL ON LZ_LL.ESA_LOAD_LOG_KEY=ESA_LL.LOG_KEY AND ESA_LL.PBLSH_IND='$PBLSH_IND' AND ESA_LL.LOAD_END_DTM ='$HIGH_DATE_TIME' AND ESA_LL.SOR_CD= '$CLM_SOR_CD' AND ESA_LL.SUBJ_AREA_NM= '$SUBJ_AREA_NM' AND ESA_LL.WORK_FLOW_NM= '$WORK_FLOW_NM' QUALIFY ROW_NUMBER() OVER (PARTITION BY udp.cc,udp.pst_no, udp.cde,udp.nbr,udp.dte,udp.LOG_KEY ORDER BY DTM DESC)=1) AS der ; Thanks in advance! Dti
Fwd: Need urgent help in hive query
Hi Experts, I'm trying with the below SQL query in Hive, which does not support column alias access in subsequent columns as shown below in the query. Is there any other way to rewrite the same without using alias? any of your help are really appreciated. INSERT INTO CAUD ( pst_cc pst_no pst_ber pst_tkn pst_dtm pst_nbr pst_cde pst_dte ) SELECT der.cc der.no der.ber der.tkn der.dtm der.nbr der.cde der.dte FROM (SELECT udp.cc udp.no udp.ber udp.tkn ,CASE WHEN udp.SYSTEM_DTE>1600 AND udp.SYSTEM_DTE<1 THEN udp.SYSTEM_DTE WHEN udp.DTE_OUT>1600 AND udp.DTE_OUT<1 THEN udp.DTE_OUT WHEN udp.DTE_IN>1600 AND udp.DTE_IN<1 THEN udp.DTE_IN ELSE '1231' END AS DTE_OUT ,CASE WHEN udp.TME_OUT <> 0 THEN udp.TME_OUT WHEN udp.TME_IN <> 0 THEN udp.TME_IN ELSE 0 END AS TME_OUT ,TRIM(CAST(TME_OUT AS CHAR(6))) AS TME_OUT1 ,CAST(CAST(SUBSTR(TRIM(DTE_OUT),1,8) AS CHAR(8)) AS DATE FORMAT 'mmdd') AS DTE_OUT_O ,CASE WHEN TME_OUT>9 THEN CAST(TME_OUT1 AS CHAR(6)) WHEN TME_OUT> AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>999 AND TME_OUT<= THEN CAST('00'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>99 AND TME_OUT<=999 THEN CAST('000'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>9 AND TME_OUT<=99 THEN CAST(''||TME_OUT1 AS CHAR(6)) WHEN TME_OUT>0 AND TME_OUT<=9 THEN CAST('0'||TME_OUT1 AS CHAR(6)) WHEN TME_OUT=0 THEN '00' END AS TME_OUT2 ,SUBSTR(TRIM(TME_OUT2),1,2)||':'||SUBSTR(TRIM(TME_OUT2),3,2)||':'||SUBSTR(TRIM(TME_OUT2),5,2) AS TME_OUT_O , CAST( DTE_OUT_O||' '||TME_OUT_O AS TIMESTAMP FORMAT 'MMDD:HH: MI:SS') AS DTM ,udp.nbr AS nbr FROM STS_GNCAUDP udp INNER JOIN LOAD_LOG LZ_LL ON udp.LOG_KEY=LZ_LL.LOG_KEY INNER JOIN ESA_LOAD_LOG ESA_LL ON LZ_LL.ESA_LOAD_LOG_KEY=ESA_LL.LOG_KEY AND ESA_LL.PBLSH_IND='$PBLSH_IND' AND ESA_LL.LOAD_END_DTM ='$HIGH_DATE_TIME' AND ESA_LL.SOR_CD= '$CLM_SOR_CD' AND ESA_LL.SUBJ_AREA_NM= '$SUBJ_AREA_NM' AND ESA_LL.WORK_FLOW_NM= '$WORK_FLOW_NM' QUALIFY ROW_NUMBER() OVER (PARTITION BY udp.cc,udp.pst_no, udp.cde,udp.nbr,udp.dte,udp.LOG_KEY ORDER BY DTM DESC)=1) AS der ; Thanks in advance! Dti
Re: Help in hive query
Hey john, I did the same using join from two tables but your suggestion helped me a lot in many ways.It works great! Thanks a ton! Regards, dti On Tue, Oct 30, 2012 at 7:27 PM, John Meagher wrote: > The WHERE part in the approvals can be moved up to be an IF in the > SELECT... > > > SELECT client_id,receive_dd,receive_hh, > receive_hh+1, > COUNT(1) AS transaction_count, > SUM( IF ( response=00, 1, 0) ) AS approval_count, > SUM( IF ( response=00, 1, 0) ) / COUNT(1) * 100 AS percent > FROM sale_test > group by fdc_client_id,receive_dd,receive_hh,receive_hh+1; > > > > On Tue, Oct 30, 2012 at 1:51 AM, dyuti a wrote: > > Hi All, > > > > I want to perform (No.of .approvals in an hour/No.of transactions in that > > hour)*100. > > > > //COUNT(1) AS cnt gives total transactions in an hour > > SELECT client_id,receive_dd,receive_hh,receive_hh+1,COUNT(1) AS cnt FROM > > sale_test group by fdc_client_id,receive_dd,receive_hh,receive_hh+1; > > > > GETREGREOS 23 16 17 5969 > > GETREGREOS 23 21 22 2602 > > GETREGREOS 24 3 4 114 > > > > //Approved transactions where response=00 > > SELECT client_id,receive_dd,receive_hh,receive_hh+1,COUNT(1) AS cnt FROM > > sale_test where response=00 group by > > client_id,receive_dd,receive_hh,receive_hh+1; > > GETREGREOS 23 16 17 5775 > > GETREGREOS 23 21 22 2515 > > GETREGREOS 24 3 4 103 > > > > > > I want to perform 100 * (5775/5969) , 100 * (2515/2602) , 100 * (103/114) > > like the same for all other clients for each hour i.e., (No.of > .approvals in > > an hour/No.of transactions in that hour)*100. > > > > Please help me out as how to achieve this in hive. > > > > > > Thanks & Regards, > > dti >
Re: Help in hive query
The WHERE part in the approvals can be moved up to be an IF in the SELECT... SELECT client_id,receive_dd,receive_hh, receive_hh+1, COUNT(1) AS transaction_count, SUM( IF ( response=00, 1, 0) ) AS approval_count, SUM( IF ( response=00, 1, 0) ) / COUNT(1) * 100 AS percent FROM sale_test group by fdc_client_id,receive_dd,receive_hh,receive_hh+1; On Tue, Oct 30, 2012 at 1:51 AM, dyuti a wrote: > Hi All, > > I want to perform (No.of .approvals in an hour/No.of transactions in that > hour)*100. > > //COUNT(1) AS cnt gives total transactions in an hour > SELECT client_id,receive_dd,receive_hh,receive_hh+1,COUNT(1) AS cnt FROM > sale_test group by fdc_client_id,receive_dd,receive_hh,receive_hh+1; > > GETREGREOS 23 16 17 5969 > GETREGREOS 23 21 22 2602 > GETREGREOS 24 3 4 114 > > //Approved transactions where response=00 > SELECT client_id,receive_dd,receive_hh,receive_hh+1,COUNT(1) AS cnt FROM > sale_test where response=00 group by > client_id,receive_dd,receive_hh,receive_hh+1; > GETREGREOS 23 16 17 5775 > GETREGREOS 23 21 22 2515 > GETREGREOS 24 3 4 103 > > > I want to perform 100 * (5775/5969) , 100 * (2515/2602) , 100 * (103/114) > like the same for all other clients for each hour i.e., (No.of .approvals in > an hour/No.of transactions in that hour)*100. > > Please help me out as how to achieve this in hive. > > > Thanks & Regards, > dti
RE: NEED HELP in Hive Query
Thanks John :-), I got it now in Pig also :-). A = load '/File/00_0' using PigStorage('\u0001') as as (name, date, url, hit:INT); B = group A by (id, name, date, url); C = foreach B generate flatten(A.id), flatten(A.name), flatten(A.url), SUM(A.hit) ; D = distinct C; Dump D; Thanks & Regards Yogesh Kumar Dhari From: j...@omernik.com Date: Sun, 14 Oct 2012 12:29:23 -0500 Subject: Re: NEED HELP in Hive Query To: user@hive.apache.org select NAME, DATE, URL, SUM(HITCOUNT) as HITCOUNT from yourtable group by NAME, DATE, URL That's the HIVE answer. Not sure the PIG answer. On Sun, Oct 14, 2012 at 9:54 AM, yogesh dhari wrote: Hi all, I have this file. I want this operation to perform in HIVE & PIG NAME DATE URL HITCOUNT timesascent.in2008-08-27 http://timesascent.in/index.aspx?page=tparchives15 timesascent.in2008-08-27 http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26 20 timesascent.in2008-08-27http://timesascent.in/37 timesascent.in2008-08-27http://timesascent.in/section/39/Job%20Wise 14 timesascent.in2008-08-27 http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html 20 timesascent.in2008-08-27http://timesascent.in/17 timesascent.in2008-08-27http://timesascent.in/section/2/Interviews 15 timesascent.in2008-08-27http://timesascent.in/17 timesascent.in2008-08-27http://timesascent.in/27 timesascent.in2008-08-27http://timesascent.in/37 timesascent.in2008-08-27http://timesascent.in/27 timesascent.in2008-08-27http://www.timesascent.in/16 timesascent.in2008-08-27http://timesascent.in/section/2/Interviews 14 timesascent.in2008-08-27http://timesascent.in/14 timesascent.in2008-08-27http://timesascent.in/22 I want to add all HITCOUNT for the same NAME, DATE & URL like timesascent.in2008-08-27http://timesascent.in/(addition of all hitcount under same name, date, url (37+17+17+27+)) Please suggest me is there any method to perform this query. Thanks & Regards Yogesh Kumar
Re: NEED HELP in Hive Query
select NAME, DATE, URL, SUM(HITCOUNT) as HITCOUNT from yourtable group by NAME, DATE, URL That's the HIVE answer. Not sure the PIG answer. On Sun, Oct 14, 2012 at 9:54 AM, yogesh dhari wrote: > Hi all, > > I have this file. I want this operation to perform in *HIVE & PIG* > > NAME DATE > URL > HITCOUNT >timesascent.in2008-08-27 > http://timesascent.in/index.aspx?page=tparchives15 > timesascent.in2008-08-27 > http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26 > 20 > timesascent.in2008-08-27http://timesascent.in/37 > timesascent.in2008-08-27 > http://timesascent.in/section/39/Job%20Wise14 > timesascent.in2008-08-27 > http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html > 20 > timesascent.in2008-08-27http://timesascent.in/17 > timesascent.in2008-08-27 > http://timesascent.in/section/2/Interviews15 > timesascent.in2008-08-27http://timesascent.in/17 >timesascent.in2008-08-27http://timesascent.in/27 > timesascent.in2008-08-27http://timesascent.in/37 > timesascent.in2008-08-27http://timesascent.in/27 > timesascent.in2008-08-27http://www.timesascent.in/16 > timesascent.in2008-08-27 > http://timesascent.in/section/2/Interviews14 > timesascent.in2008-08-27http://timesascent.in/14 > timesascent.in2008-08-27http://timesascent.in/22 > > > I want to *add all HITCOUNT for the same NAME, DATE & URL * > > like > > timesascent.in2008-08-27http://timesascent.in/(addition of > all hitcount under same name, date, url (37+17+17+27+)) > > Please suggest me is there any method to perform this query. > > > Thanks & Regards > Yogesh Kumar > > > >
NEED HELP in Hive Query
Hi all, I have this file. I want this operation to perform in HIVE & PIG NAME DATE URL HITCOUNT timesascent.in2008-08-27 http://timesascent.in/index.aspx?page=tparchives15 timesascent.in2008-08-27 http://timesascent.in/index.aspx?page=article§id=1&contentid=200812182008121814134447219270b26 20 timesascent.in2008-08-27http://timesascent.in/37 timesascent.in2008-08-27http://timesascent.in/section/39/Job%20Wise 14 timesascent.in2008-08-27 http://timesascent.in/article/7/2011062120110621171709769aacc537/Work-environment--Employee-productivity.html 20 timesascent.in2008-08-27http://timesascent.in/17 timesascent.in2008-08-27http://timesascent.in/section/2/Interviews 15 timesascent.in2008-08-27http://timesascent.in/17 timesascent.in2008-08-27http://timesascent.in/27 timesascent.in2008-08-27http://timesascent.in/37 timesascent.in2008-08-27http://timesascent.in/27 timesascent.in2008-08-27http://www.timesascent.in/16 timesascent.in2008-08-27http://timesascent.in/section/2/Interviews 14 timesascent.in2008-08-27http://timesascent.in/14 timesascent.in2008-08-27http://timesascent.in/22 I want to add all HITCOUNT for the same NAME, DATE & URL like timesascent.in2008-08-27http://timesascent.in/(addition of all hitcount under same name, date, url (37+17+17+27+)) Please suggest me is there any method to perform this query. Thanks & Regards Yogesh Kumar
Re: Help in hive query
Thanks Jan. It worked! Regards, Manu On Wed, Oct 10, 2012 at 12:00 PM, Jan Dolinár wrote: > Hi Manu, > > I believe the last "group by q2.auth_count" is wrong, because it > causes computing average only across lines with same value of > q2.auth_count, which is of course equal to its value. > > Best regards, > J. Dolinar > > On Wed, Oct 10, 2012 at 8:19 AM, Manu A wrote: > > Hi All, > > The result for the below query is 194965.00.0 , but 194965 is the > > result of inner query from count(q1.response). It looks like the outer > query > > [select avg(q2.auth_count), stddev_pop(q2.auth_count)]didn't work at all. > > > > > > //Query > > select avg(q2.auth_count), stddev_pop(q2.auth_count) > > from ( > > select q1.TEXT_CCYY ,count(q1.response) as auth_count > > from( > >select * from Sale1 where TEXT_DD=7 AND TEXT_HH=15 AND > > response=00)q1 > > group by q1.TEXT_CCYY,q1.response)q2 > > group by q2.auth_count; > > > > > > Please help me is there anything i have to change in query. > > > > > > Thanks & Regards, > > Manu > > > > >
Re: Help in hive query
Hi Manu, I believe the last "group by q2.auth_count" is wrong, because it causes computing average only across lines with same value of q2.auth_count, which is of course equal to its value. Best regards, J. Dolinar On Wed, Oct 10, 2012 at 8:19 AM, Manu A wrote: > Hi All, > The result for the below query is 194965.00.0 , but 194965 is the > result of inner query from count(q1.response). It looks like the outer query > [select avg(q2.auth_count), stddev_pop(q2.auth_count)]didn't work at all. > > > //Query > select avg(q2.auth_count), stddev_pop(q2.auth_count) > from ( > select q1.TEXT_CCYY ,count(q1.response) as auth_count > from( >select * from Sale1 where TEXT_DD=7 AND TEXT_HH=15 AND > response=00)q1 > group by q1.TEXT_CCYY,q1.response)q2 > group by q2.auth_count; > > > Please help me is there anything i have to change in query. > > > Thanks & Regards, > Manu > >
Help in hive query
Hi All, The result for the below query is 194965.00.0 , but 194965 is the result of inner query from count(q1.response). It looks like the outer query [select avg(q2.auth_count), stddev_pop(q2.auth_count)]didn't work at all. //Query select avg(q2.auth_count), stddev_pop(q2.auth_count) from ( select q1.TEXT_CCYY ,count(q1.response) as auth_count from( select * from Sale1 where TEXT_DD=7 AND TEXT_HH=15 AND response=00)q1 group by q1.TEXT_CCYY,q1.response)q2 group by q2.auth_count; Please help me is there anything i have to change in query. *Thanks & Regards,* *Manu*