RE: LINES TERMINATED BY only supports newline '\n' right now
I’ve checked “sentences” source code. It turns out it is using BreakIterator.getSentenceInstance to break the text to sentences. Apparently ‘\n’ is not considered as a sentence separator nor ‘.’, but ‘?’ and ‘!’ does. Dudu hive> select id,name,sentences(regexp_replace (lyrics,'\n','?')) from songs; 1 All For Leyna [["She","stood","on","the","tracks"],["Waving","her","arms"],["Leading","me","to","that","third","rail","shock"],["Quick","as","a","wink"],["She","changed","her","mind"]] 2 Goodnight Saigon [["We","met","as","soul","mates"],["On","Parris","Island"],["We","left","as","inmates"],["From","an","asylum"],["And","we","were","sharp"],["As","sharp","as","knives"],["And","we","were","so","gung","ho"],["To","lay","down","our","lives"]] From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Thursday, June 09, 2016 10:58 AM To: user@hive.apache.org Subject: RE: LINES TERMINATED BY only supports newline '\n' right now Partial success after few more trials and errors – 1. “insert into … values (),(),…,()” doesn’t work right in any case “insert into … values (); insert into … values ();…;insert into … values();” works only with textinputformat.record.delimiter changed. Insert into … select … union all select … works fine (no need to touch textinputformat.record.delimiter) 2. No bugs around aggregative functions 3. “sentences” still doesn’t work as expected. We can see that “split” works correctly. hive> select id,name,split(lyrics,'\n') from songs; 1 All For Leyna ["She stood on the tracks","Waving her arms","Leading me to that third rail shock","Quick as a wink","She changed her mind"] 2 Goodnight Saigon["We met as soul mates","On Parris Island","We left as inmates","From an asylum","And we were sharp","As sharp as knives","And we were so gung ho","To lay down our lives"] hive> select id,name,sentences(lyrics) from songs; 1 All For Leyna [["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]] 2 Goodnight Saigon [["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]] From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Thursday, June 09, 2016 10:23 AM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: RE: LINES TERMINATED BY only supports newline '\n' right now Same issues. Dudu From: abhishek [mailto:ec.abhis...@gmail.com] Sent: Thursday, June 09, 2016 9:23 AM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: LINES TERMINATED BY only supports newline '\n' right now Did you try defining the table with hive In built SerDe. 'Stored as ORC' This should resolve your issue. Plz try and let me know if it works. Abhi Sent from my iPhone On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: Here is an example, but first – some warnings: · You should set textinputformat.record.delimiter not only for the populating of the table but also for querying it · There seems to be many issues around this area – o When I tried to insert multiple values in a single statement (“insert into table … values
RE: LINES TERMINATED BY only supports newline '\n' right now
Partial success after few more trials and errors – 1. “insert into … values (),(),…,()” doesn’t work right in any case “insert into … values (); insert into … values ();…;insert into … values();” works only with textinputformat.record.delimiter changed. Insert into … select … union all select … works fine (no need to touch textinputformat.record.delimiter) 2. No bugs around aggregative functions 3. “sentences” still doesn’t work as expected. We can see that “split” works correctly. hive> select id,name,split(lyrics,'\n') from songs; 1 All For Leyna ["She stood on the tracks","Waving her arms","Leading me to that third rail shock","Quick as a wink","She changed her mind"] 2 Goodnight Saigon["We met as soul mates","On Parris Island","We left as inmates","From an asylum","And we were sharp","As sharp as knives","And we were so gung ho","To lay down our lives"] hive> select id,name,sentences(lyrics) from songs; 1 All For Leyna [["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]] 2 Goodnight Saigon [["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]] From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Thursday, June 09, 2016 10:23 AM To: user@hive.apache.org Subject: RE: LINES TERMINATED BY only supports newline '\n' right now Same issues. Dudu From: abhishek [mailto:ec.abhis...@gmail.com] Sent: Thursday, June 09, 2016 9:23 AM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: LINES TERMINATED BY only supports newline '\n' right now Did you try defining the table with hive In built SerDe. 'Stored as ORC' This should resolve your issue. Plz try and let me know if it works. Abhi Sent from my iPhone On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: Here is an example, but first – some warnings: · You should set textinputformat.record.delimiter not only for the populating of the table but also for querying it · There seems to be many issues around this area – o When I tried to insert multiple values in a single statement (“insert into table … values (…),(…),(…)”) only the first set of values was inserted correctly o 2 new lines were added to the end of each text (‘lyrics’) although there should be none. o Aggregative queries seems to return null for the last column. Sometimes. o The function ‘sentences’ does not work as expected. It treated the whole text as a single line. Dudu Example hive> create table songs (id int,name string,lyrics string) ; hive> set textinputformat.record.delimiter='\0' ; hive> insert into table songs values ( 1 ,'All For Leyna' ,'She stood on the tracks Waving her arms Leading me to that third rail shock Quick as a wink She changed her mind' ) ; hive> insert into table songs values ( 2 ,'Goodnight Saigon' ,'We met as soul mates On Parris Island We left as inmates From an asylum And we were sharp As sharp as knives And we were so gung ho To lay down our lives' ) ; hive> select id,name,length(lyrics) from songs; 1 All For Leyna 114 2 Goodnight Saigon155 hive> select id,name,hex(lyrics) from songs; 1 All For Leyna 5368652073746F6F64206F6E2074686520747261636B730A576176696E67206865722061726D730A4C656164696E67206D6520746F2074686174207468697264207261696C2073686F636B0A517569636B20617320612077696E6B0A536865206368616E67656420686572206D696E640A0A 2 Goodnight Saigon 5765206D657420617320736F756C206D617465730A4F6E205061727269732049736C616E640A5765206C65667420617320696E6D617465730A46726F6D20616E206173796C756D0A416E6420776520776572652073686172700A4173207368617270206173206B6E697665730A416E64207765207765726520736F2067756E6720686F0A546F206C617920646F776E206F7572206C697665730A0A hive&g
RE: LINES TERMINATED BY only supports newline '\n' right now
Same issues. Dudu From: abhishek [mailto:ec.abhis...@gmail.com] Sent: Thursday, June 09, 2016 9:23 AM To: user@hive.apache.org Subject: Re: LINES TERMINATED BY only supports newline '\n' right now Did you try defining the table with hive In built SerDe. 'Stored as ORC' This should resolve your issue. Plz try and let me know if it works. Abhi Sent from my iPhone On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: Here is an example, but first – some warnings: · You should set textinputformat.record.delimiter not only for the populating of the table but also for querying it · There seems to be many issues around this area – o When I tried to insert multiple values in a single statement (“insert into table … values (…),(…),(…)”) only the first set of values was inserted correctly o 2 new lines were added to the end of each text (‘lyrics’) although there should be none. o Aggregative queries seems to return null for the last column. Sometimes. o The function ‘sentences’ does not work as expected. It treated the whole text as a single line. Dudu Example hive> create table songs (id int,name string,lyrics string) ; hive> set textinputformat.record.delimiter='\0' ; hive> insert into table songs values ( 1 ,'All For Leyna' ,'She stood on the tracks Waving her arms Leading me to that third rail shock Quick as a wink She changed her mind' ) ; hive> insert into table songs values ( 2 ,'Goodnight Saigon' ,'We met as soul mates On Parris Island We left as inmates From an asylum And we were sharp As sharp as knives And we were so gung ho To lay down our lives' ) ; hive> select id,name,length(lyrics) from songs; 1 All For Leyna 114 2 Goodnight Saigon155 hive> select id,name,hex(lyrics) from songs; 1 All For Leyna 5368652073746F6F64206F6E2074686520747261636B730A576176696E67206865722061726D730A4C656164696E67206D6520746F2074686174207468697264207261696C2073686F636B0A517569636B20617320612077696E6B0A536865206368616E67656420686572206D696E640A0A 2 Goodnight Saigon 5765206D657420617320736F756C206D617465730A4F6E205061727269732049736C616E640A5765206C65667420617320696E6D617465730A46726F6D20616E206173796C756D0A416E6420776520776572652073686172700A4173207368617270206173206B6E697665730A416E64207765207765726520736F2067756E6720686F0A546F206C617920646F776E206F7572206C697665730A0A hive> select id,name,regexp_replace(lyrics,'\n','<<>>') from songs; 1 All For Leyna She stood on the tracks<<>>Waving her arms<<>>Leading me to that third rail shock<<>>Quick as a wink<<>>She changed her mind<<>><<>> 2 Goodnight SaigonWe met as soul mates<<>>On Parris Island<<>>We left as inmates<<>>From an asylum<<>>And we were sharp<<>>As sharp as knives<<>>And we were so gung ho<<>>To lay down our lives<<>><<>> hive> select id,name,split(lyrics,'\n') from songs; 1 All For Leyna ["She stood on the tracks","Waving her arms","Leading me to that third rail shock","Quick as a wink","She changed her mind","",""] 2 Goodnight Saigon["We met as soul mates","On Parris Island","We left as inmates","From an asylum","And we were sharp","As sharp as knives","And we were so gung ho","To lay down our lives","",""] hive> select id,name,sentences(lyrics) from songs; 1 All For Leyna [["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]] 2 Goodnight Saigon [["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]] hiv
Re: LINES TERMINATED BY only supports newline '\n' right now
Did you try defining the table with hive In built SerDe. 'Stored as ORC' This should resolve your issue. Plz try and let me know if it works. Abhi Sent from my iPhone > On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu wrote: > > Here is an example, but first – some warnings: > > · You should set textinputformat.record.delimiter not only for the > populating of the table but also for querying it > > · There seems to be many issues around this area – > o When I tried to insert multiple values in a single statement (“insert > into table … values (…),(…),(…)”) only the first set of values was inserted > correctly > o 2 new lines were added to the end of each text (‘lyrics’) although there > should be none. > o Aggregative queries seems to return null for the last column. Sometimes. > o The function ‘sentences’ does not work as expected. It treated the whole > text as a single line. > > Dudu > > > Example > > hive> create table songs (id int,name string,lyrics string) > ; > > hive> set textinputformat.record.delimiter='\0' > ; > > hive> insert into table songs values > ( > 1 > ,'All For Leyna' > ,'She stood on the tracks > Waving her arms > Leading me to that third rail shock > Quick as a wink > She changed her mind' > ) > ; > > hive> insert into table songs values > ( > 2 > ,'Goodnight Saigon' > ,'We met as soul mates > On Parris Island > We left as inmates > From an asylum > And we were sharp > As sharp as knives > And we were so gung ho > To lay down our lives' > ) > ; > > hive> select id,name,length(lyrics) from songs; > > 1 All For Leyna 114 > 2 Goodnight Saigon155 > > > hive> select id,name,hex(lyrics) from songs; > > 1 All For Leyna > 5368652073746F6F64206F6E2074686520747261636B730A576176696E67206865722061726D730A4C656164696E67206D6520746F2074686174207468697264207261696C2073686F636B0A517569636B20617320612077696E6B0A536865206368616E67656420686572206D696E640A0A > 2 Goodnight Saigon > 5765206D657420617320736F756C206D617465730A4F6E205061727269732049736C616E640A5765206C65667420617320696E6D617465730A46726F6D20616E206173796C756D0A416E6420776520776572652073686172700A4173207368617270206173206B6E697665730A416E64207765207765726520736F2067756E6720686F0A546F206C617920646F776E206F7572206C697665730A0A > > hive> select id,name,regexp_replace(lyrics,'\n','<<>>') from songs; > > 1 All For Leyna She stood on the tracks<<>>Waving > her arms<<>>Leading me to that third rail shock<<>>Quick as > a wink<<>>She changed her mind<<>><<>> > 2 Goodnight SaigonWe met as soul > mates<<>>On Parris Island<<>>We left as > inmates<<>>From an asylum<<>>And we were > sharp<<>>As sharp as knives<<>>And we were so gung > ho<<>>To lay down our lives<<>><<>> > > hive> select id,name,split(lyrics,'\n') from songs; > > 1 All For Leyna ["She stood on the tracks","Waving her > arms","Leading me to that third rail shock","Quick as a wink","She changed > her mind","",""] > 2 Goodnight Saigon["We met as soul mates","On Parris > Island","We left as inmates","From an asylum","And we were sharp","As sharp > as knives","And we were so gung ho","To lay down our lives","",""] > > hive> select id,name,sentences(lyrics) from songs; > > 1 All For Leyna > [["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]] > 2 Goodnight Saigon > [["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]] > > hive> select count (*) from songs; > > NULL > > hive> select count (*),123,456,789 from songs; > > 2 123 456 NULL > > hive> select count (*),'A','B','C' from songs; > > 2 A B C > > > From: Radha krishna [mailto:grkmc...@gmail.com] > Sent: Thursday, June 02, 2016 12:42 PM > To: user@hive.apache.org > Subject: LINES TERMINATED BY only supports newline '\n' right now > > For some of the columns '\n' character is there as part of value, i want to > create a hive table for this data i tried by creating the hive table with US > as the line separator but it showing the below message > > Ex: > CREATE EXTERNAL TABLE IF NOT EXISTS emp (name String,id int) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '28' > LINES TERMINATED BY '31' > LOCATION 'emp.txt file path'; > > FAILED: SemanticException 4:20 LINES TERMINATED BY only supports newline '\n' > right now. Error encountered near token ''31'' > > how can we create hive table with out removing the \n character as part of > column value ( requirement
RE: LINES TERMINATED BY only supports newline '\n' right now
Here is an example, but first – some warnings: · You should set textinputformat.record.delimiter not only for the populating of the table but also for querying it · There seems to be many issues around this area – o When I tried to insert multiple values in a single statement (“insert into table … values (…),(…),(…)”) only the first set of values was inserted correctly o 2 new lines were added to the end of each text (‘lyrics’) although there should be none. o Aggregative queries seems to return null for the last column. Sometimes. o The function ‘sentences’ does not work as expected. It treated the whole text as a single line. Dudu Example hive> create table songs (id int,name string,lyrics string) ; hive> set textinputformat.record.delimiter='\0' ; hive> insert into table songs values ( 1 ,'All For Leyna' ,'She stood on the tracks Waving her arms Leading me to that third rail shock Quick as a wink She changed her mind' ) ; hive> insert into table songs values ( 2 ,'Goodnight Saigon' ,'We met as soul mates On Parris Island We left as inmates From an asylum And we were sharp As sharp as knives And we were so gung ho To lay down our lives' ) ; hive> select id,name,length(lyrics) from songs; 1 All For Leyna 114 2 Goodnight Saigon155 hive> select id,name,hex(lyrics) from songs; 1 All For Leyna 5368652073746F6F64206F6E2074686520747261636B730A576176696E67206865722061726D730A4C656164696E67206D6520746F2074686174207468697264207261696C2073686F636B0A517569636B20617320612077696E6B0A536865206368616E67656420686572206D696E640A0A 2 Goodnight Saigon 5765206D657420617320736F756C206D617465730A4F6E205061727269732049736C616E640A5765206C65667420617320696E6D617465730A46726F6D20616E206173796C756D0A416E6420776520776572652073686172700A4173207368617270206173206B6E697665730A416E64207765207765726520736F2067756E6720686F0A546F206C617920646F776E206F7572206C697665730A0A hive> select id,name,regexp_replace(lyrics,'\n','<<>>') from songs; 1 All For Leyna She stood on the tracks<<>>Waving her arms<<>>Leading me to that third rail shock<<>>Quick as a wink<<>>She changed her mind<<>><<>> 2 Goodnight SaigonWe met as soul mates<<>>On Parris Island<<>>We left as inmates<<>>From an asylum<<>>And we were sharp<<>>As sharp as knives<<>>And we were so gung ho<<>>To lay down our lives<<>><<>> hive> select id,name,split(lyrics,'\n') from songs; 1 All For Leyna ["She stood on the tracks","Waving her arms","Leading me to that third rail shock","Quick as a wink","She changed her mind","",""] 2 Goodnight Saigon["We met as soul mates","On Parris Island","We left as inmates","From an asylum","And we were sharp","As sharp as knives","And we were so gung ho","To lay down our lives","",""] hive> select id,name,sentences(lyrics) from songs; 1 All For Leyna [["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]] 2 Goodnight Saigon [["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]] hive> select count (*) from songs; NULL hive> select count (*),123,456,789 from songs; 2 123 456 NULL hive> select count (*),'A','B','C' from songs; 2 A B C From: Radha krishna [mailto:grkmc...@gmail.com] Sent: Thursday, June 02, 2016 12:42 PM To: user@hive.apache.org Subject: LINES TERMINATED BY only supports newline '\n' right now For some of the columns '\n' character is there as part of value, i want to create a hive table for this data i tried by creating the hive table with US as the line separator but it showing the below message Ex: CREATE EXTERNAL TABLE IF NOT EXISTS emp (name String,id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '28' LINES TERMINATED BY '31' LOCATION 'emp.txt file path'; FAILED: SemanticException 4:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''31'' how can we create hive table with out removing the \n character as part of column value ( requirement data need to maintain as it is) can any one implemented hive tables with line separated other than \n Thanks & Regards Radha krishna
RE: LINES TERMINATED BY only supports newline '\n' right now
One quick way to solve this is setting textinputformat.record.delimiter=”US” before reading from that text table. But it has some limitations: 1,you cannot read multiple text tables with different line delimiters in the same query. 2,you cannot write to that text table with your own line delimiter. Also ,there is patch at https://issues.apache.org/jira/browse/HIVE-5999 Allow other characters for LINES TERMINATED BY . Which may meet your requirements. From: Peter Vary [mailto:pv...@cloudera.com] Sent: Friday, June 03, 2016 1:29 AM To: user@hive.apache.org Subject: Re: LINES TERMINATED BY only supports newline '\n' right now Hi, According to the documentation you should write and set your own Inputformat when creating the table. Mike Sukmanowsky solved a similar problem here, this might help you: http://stackoverflow.com/questions/7692994/custom-inputformat-with-hive Regards, Peter For some of the columns '\n' character is there as part of value, i want to create a hive table for this data i tried by creating the hive table with US as the line separator but it showing the below message Ex: CREATE EXTERNAL TABLE IF NOT EXISTS emp (name String,id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '28' LINES TERMINATED BY '31' LOCATION 'emp.txt file path'; FAILED: SemanticException 4:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''31'' how can we create hive table with out removing the \n character as part of column value ( requirement data need to maintain as it is) can any one implemented hive tables with line separated other than \n Thanks & Regards Radha krishna
Re: LINES TERMINATED BY only supports newline '\n' right now
Hi, According to the documentation you should write and set your own Inputformat when creating the table. Mike Sukmanowsky solved a similar problem here, this might help you: http://stackoverflow.com/questions/7692994/custom-inputformat-with-hive Regards, Peter For some of the columns '\n' character is there as part of value, i want to create a hive table for this data i tried by creating the hive table with US as the line separator but it showing the below message Ex: CREATE EXTERNAL TABLE IF NOT EXISTS emp (name String,id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '28' LINES TERMINATED BY '31' LOCATION 'emp.txt file path'; FAILED: SemanticException 4:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''31'' how can we create hive table with out removing the \n character as part of column value ( requirement data need to maintain as it is) can any one implemented hive tables with line separated other than \n Thanks & Regards Radha krishna