Re: Using SPLIT with DOT(.) delimiter demonstrate funny behavior within a VIEW
Hi Vivek Thanks A few more experiments The \\ , \\\ and all work correctly if used as following split(reverse(split(reverse(floc),'/')[0]),'[\\.]')[0] (need those square brackets) split(reverse(split(reverse(floc),'/')[0]),'\\\.')[0] (does not need square brackets) split(reverse(split(reverse(floc),'/')[0]),'[.]')[0] (need those square brackets) From: Vivek Veeramani To: user@hive.apache.org; Sanjay Subramanian Sent: Monday, August 24, 2015 1:57 PM Subject: Re: Using SPLIT with DOT(.) delimiter demonstrate funny behavior within a VIEW Hi Sanjay, Try replacing the 4 backslashes with just 2. Usually works with 2 backslashes. Replace this split(reverse(split(reverse(floc),'/')[0]),'.')[0] as resid with split(reverse(split(reverse(floc),'/')[0]),'\\.')[0] as resid Please have a look and let us know if that helps. Best , Vivek Veeramani cell : + 1-415 996 7853 On Mon, Aug 24, 2015 at 12:15 PM, Sanjay Subramanian wrote: Hi guys I am using Hive version = 0.13.1-cdh5.3.3 HIVE TABLE = qnap_resume_file_location---DROP TABLE IF EXISTS qnap_resume_file_location;CREATE EXTERNAL TABLE qnap_resume_file_location ( floc STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/data/myfirm/file_location'; SAMPLE DATA = qnap_resume_file_location-"select * from resume.qnap_resume_file_location limit 10"/mnt/myfirm/200901/14008_1_1004942032_0_1005347483.html /mnt/myfirm/200901/14008_1_1004944724_0_1005334758.html/mnt/myfirm/200901/14000__900719077_10_901309048.doc/mnt/myfirm/200901/14012_6_1004858088_0_1005227990.doc/mnt/myfirm/200901/14008_1_1004811416_0_1005258541.html/mnt/myfirm/200901/14008_1_501597002_6_1005224039.html/mnt/myfirm/200901/14009_5_1004872908_0_1005244348.doc/mnt/myfirm/200901/14001_7_14165_1_66775.docx/mnt/myfirm/200901/14009_5_1004935267_0_1005322091.txt/mnt/myfirm/200901/14012_6_1004904422_0_1005283729.doc HIVE VIEW = qnap_resume_resid_file_loc_map_vw-DROP VIEW IF EXISTS qnap_resume_resid_file_loc_map_vw;CREATE VIEW IF NOT EXISTS qnap_resume_resid_file_loc_map_vw ( resid , mm , floc )ASSELECT split(reverse(split(reverse(floc),'/')[0]),'.')[0] as resid, reverse(split(reverse(floc),'/')[1]) as mm, flocFROM resume.qnap_resume_file_location; QUERY ON VIEW"select * from qnap_resume_resid_file_loc_map_vw limit 10"EXPECTED RESULTS 14012_6_1006686583_0_1102955123 201205 /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx14009_5_1008440384_0_1102887209 201205 /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc14013_7_700200576_8_1102752594 201205 /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc14015_8_1008470815_0_1102954525 201205 /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc14009_5_1008373102_0_1102709973 201205 /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc14011_1_1007981566_0_1102730546 201205 /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt14015_8_1008436709_0_1102867682 201205 /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx14001_7_1006347358_0_1007662042 201205 /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx14009_5_1008382590_0_1102732450 201205 /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc14000_6_1008189721_0_1102253740 201205 /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx ACTUAL RESULTS (see the first column the DOT and extension are still there) 14012_6_1006686583_0_1102955123 .docx 201205 /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx14009_5_1008440384_0_1102887209 .doc 201205 /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc14013_7_700200576_8_1102752594 .doc 201205 /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc14015_8_1008470815_0_1102954525.doc 201205 /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc14009_5_1008373102_0_1102709973 .doc 201205 /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc14011_1_1007981566_0_1102730546 .txt 201205 /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt14015_8_1008436709_0_1102867682 .docx 201205 /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx14001_7_1006347358_0_1007662042 .docx 201205 /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx14009_5_1008382590_0_1102732450 .doc 201205 /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc14000_6_1008189721_0_1102253740 .docx 201205 /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx EXECUTING THE SELECT HIVE QUERY INSIDE THE VIEW INDEPENDENTLY (gives u correct results - but when placed inside a view then it does not seem to work)SELECT split(reverse(split(reverse(floc),'/')[0]),'.')[0] as resid, reverse(split(reverse(floc),'
Re: Using SPLIT with DOT(.) delimiter demonstrate funny behavior within a VIEW
Hi Sanjay, Try replacing the 4 backslashes with just 2. Usually works with 2 backslashes. Replace this *split(reverse(split(reverse(**floc**),'/')[0]),'.')[0]* as resid with *split(reverse(split(reverse(floc**),'/')[0]),'\\.')[0]* as resid Please have a look and let us know if that helps. Best , Vivek Veeramani cell : + 1-415 996 7853 On Mon, Aug 24, 2015 at 12:15 PM, Sanjay Subramanian < sanjaysubraman...@yahoo.com> wrote: > Hi guys > > I am using > > Hive version = 0.13.1-cdh5.3.3 > > > > *HIVE TABLE = qnap_resume_file_location* > *---* > DROP TABLE IF EXISTS > qnap_resume_file_location > ; > CREATE EXTERNAL TABLE qnap_resume_file_location ( > floc STRING > ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/data/ > myfirm/file_location' > ; > > *SAMPLE DATA = qnap_resume_file_location* > *-* > "select * from resume.qnap_resume_file_location limit 10" > /mnt/myfirm/200901/14008_1_1004942032_0_1005347483.html > /mnt/myfirm/200901/14008_1_1004944724_0_1005334758.html > /mnt/myfirm/200901/14000__900719077_10_901309048.doc > /mnt/myfirm/200901/14012_6_1004858088_0_1005227990.doc > /mnt/myfirm/200901/14008_1_1004811416_0_1005258541.html > /mnt/myfirm/200901/14008_1_501597002_6_1005224039.html > /mnt/myfirm/200901/14009_5_1004872908_0_1005244348.doc > /mnt/myfirm/200901/14001_7_14165_1_66775.docx > /mnt/myfirm/200901/14009_5_1004935267_0_1005322091.txt > /mnt/myfirm/200901/14012_6_1004904422_0_1005283729.doc > > > *HIVE VIEW = qnap_resume_resid_file_loc_map_vw* > *-* > DROP VIEW IF EXISTS > qnap_resume_resid_file_loc_map_vw > ; > CREATE VIEW IF NOT EXISTS qnap_resume_resid_file_loc_map_vw ( > resid , > mm , > floc > ) > AS > SELECT > split(reverse(split(reverse(floc),'/')[0]),'.')[0] as resid, > reverse(split(reverse(floc),'/')[1]) as mm, > floc > FROM > resume.qnap_resume_file_location > ; > > > *QUERY ON VIEW* > "select * from qnap_resume_resid_file_loc_map_vw limit 10" > *EXPECTED RESULTS* > *14012_6_1006686583_0_1102955123* 201205 > /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx > 14009_5_1008440384_0_1102887209 201205 > /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc > 14013_7_700200576_8_1102752594 201205 > /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc > 14015_8_1008470815_0_1102954525 201205 > /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc > 14009_5_1008373102_0_1102709973 201205 > /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc > 14011_1_1007981566_0_1102730546 201205 > /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt > 14015_8_1008436709_0_1102867682 201205 > /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx > 14001_7_1006347358_0_1007662042 201205 > /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx > 14009_5_1008382590_0_1102732450 201205 > /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc > 14000_6_1008189721_0_1102253740 201205 > /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx > > *ACTUAL RESULTS* (see the first column the DOT and extension are still > there) > *14012_6_1006686583_0_1102955123** .docx* 201205 > /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx > 14009_5_1008440384_0_1102887209 .doc 201205 > /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc > 14013_7_700200576_8_1102752594 .doc 201205 > /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc > 14015_8_1008470815_0_1102954525.doc 201205 > /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc > 14009_5_1008373102_0_1102709973 .doc 201205 > /mnt/myfirm/201205/14009_5_1008373102_0_1102709973.doc > 14011_1_1007981566_0_1102730546 .txt 201205 > /mnt/myfirm/201205/14011_1_1007981566_0_1102730546.txt > 14015_8_1008436709_0_1102867682 .docx 201205 > /mnt/myfirm/201205/14015_8_1008436709_0_1102867682.docx > 14001_7_1006347358_0_1007662042 .docx 201205 > /mnt/myfirm/201205/14001_7_1006347358_0_1007662042.docx > 14009_5_1008382590_0_1102732450 .doc 201205 > /mnt/myfirm/201205/14009_5_1008382590_0_1102732450.doc > 14000_6_1008189721_0_1102253740 .docx 201205 > /mnt/myfirm/201205/14000_6_1008189721_0_1102253740.docx > > *EXECUTING THE SELECT HIVE QUERY INSIDE THE VIEW INDEPENDENTLY (gives u > correct results - but when placed inside a view then it does not seem to > work)* > SELECT > split(reverse(split(reverse(floc),'/')[0]),'.')[0] as resid, > reverse(split(reverse(floc),'/')[1]) as mm, > floc > FROM > resume.qnap_resume_file_location > > 14012_6_1006686583_0_1102955123 201205 > /mnt/myfirm/201205/14012_6_1006686583_0_1102955123.docx > 14009_5_1008440384_0_1102887209 201205 > /mnt/myfirm/201205/14009_5_1008440384_0_1102887209.doc > 14013_7_700200576_8_1102752594 201205 > /mnt/myfirm/201205/14013_7_700200576_8_1102752594.doc > 14015_8_1008470815_0_1102954525 201205 > /mnt/myfirm/201205/14015_8_1008470815_0_1102954525.doc > 14009_5_1008373102_0_1102709973 20