RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-09 Thread Markovitz, Dudu
I’ve checked “sentences” source code.
It turns out it is using BreakIterator.getSentenceInstance to break the text to 
sentences.
Apparently ‘\n’ is not considered as a sentence separator nor ‘.’, but ‘?’ and 
‘!’ does.

Dudu


hive> select id,name,sentences(regexp_replace (lyrics,'\n','?')) from songs;

1  All For Leyna
[["She","stood","on","the","tracks"],["Waving","her","arms"],["Leading","me","to","that","third","rail","shock"],["Quick","as","a","wink"],["She","changed","her","mind"]]
2  Goodnight Saigon
[["We","met","as","soul","mates"],["On","Parris","Island"],["We","left","as","inmates"],["From","an","asylum"],["And","we","were","sharp"],["As","sharp","as","knives"],["And","we","were","so","gung","ho"],["To","lay","down","our","lives"]]

From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Thursday, June 09, 2016 10:58 AM
To: user@hive.apache.org
Subject: RE: LINES TERMINATED BY only supports newline '\n' right now

Partial success after few more trials and errors –

1.
“insert into … values (),(),…,()” doesn’t work right in any case
“insert into … values (); insert into … values ();…;insert into … values();” 
works only with textinputformat.record.delimiter changed.
Insert into … select … union all select … works fine (no need to touch 
textinputformat.record.delimiter)

2.
No bugs around aggregative functions

3.
“sentences” still doesn’t work as expected.
We can see that “split” works correctly.

hive> select id,name,split(lyrics,'\n') from songs;

1  All For Leyna  ["She stood on the tracks","Waving her 
arms","Leading me to that third rail shock","Quick as a wink","She changed her 
mind"]
2  Goodnight Saigon["We met as soul mates","On Parris 
Island","We left as inmates","From an asylum","And we were sharp","As sharp as 
knives","And we were so gung ho","To lay down our lives"]

hive> select id,name,sentences(lyrics) from songs;

1  All For Leyna
[["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]]
2  Goodnight Saigon
[["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]]





From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Thursday, June 09, 2016 10:23 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: LINES TERMINATED BY only supports newline '\n' right now

Same issues.

Dudu

From: abhishek [mailto:ec.abhis...@gmail.com]
Sent: Thursday, June 09, 2016 9:23 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: LINES TERMINATED BY only supports newline '\n' right now


Did you try defining the table with hive In built SerDe. 'Stored as ORC'
This should resolve your issue. Plz try and let me know if it works.

Abhi
Sent from my iPhone

On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu 
mailto:dmarkov...@paypal.com>> wrote:
Here is an example, but first – some warnings:


· You should set textinputformat.record.delimiter not only for the 
populating of the table but also for querying it



· There seems to be many issues around this area –

o   When I tried to insert multiple values in a single statement (“insert into 
table … values 

RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-09 Thread Markovitz, Dudu
Partial success after few more trials and errors –

1.
“insert into … values (),(),…,()” doesn’t work right in any case
“insert into … values (); insert into … values ();…;insert into … values();” 
works only with textinputformat.record.delimiter changed.
Insert into … select … union all select … works fine (no need to touch 
textinputformat.record.delimiter)

2.
No bugs around aggregative functions

3.
“sentences” still doesn’t work as expected.
We can see that “split” works correctly.

hive> select id,name,split(lyrics,'\n') from songs;

1  All For Leyna  ["She stood on the tracks","Waving her 
arms","Leading me to that third rail shock","Quick as a wink","She changed her 
mind"]
2  Goodnight Saigon["We met as soul mates","On Parris 
Island","We left as inmates","From an asylum","And we were sharp","As sharp as 
knives","And we were so gung ho","To lay down our lives"]

hive> select id,name,sentences(lyrics) from songs;

1  All For Leyna
[["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]]
2  Goodnight Saigon    
[["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]]





From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Thursday, June 09, 2016 10:23 AM
To: user@hive.apache.org
Subject: RE: LINES TERMINATED BY only supports newline '\n' right now

Same issues.

Dudu

From: abhishek [mailto:ec.abhis...@gmail.com]
Sent: Thursday, June 09, 2016 9:23 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: LINES TERMINATED BY only supports newline '\n' right now


Did you try defining the table with hive In built SerDe. 'Stored as ORC'
This should resolve your issue. Plz try and let me know if it works.

Abhi
Sent from my iPhone

On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu 
mailto:dmarkov...@paypal.com>> wrote:
Here is an example, but first – some warnings:


· You should set textinputformat.record.delimiter not only for the 
populating of the table but also for querying it



· There seems to be many issues around this area –

o   When I tried to insert multiple values in a single statement (“insert into 
table … values (…),(…),(…)”) only the first set of values was inserted correctly

o   2 new lines were added to the end of each text (‘lyrics’) although there 
should be none.

o   Aggregative queries seems to return null for the last column. Sometimes.

o   The function ‘sentences’ does not work as expected. It treated the whole 
text as a single line.



Dudu





Example


hive> create table songs (id int,name string,lyrics string)
;

hive> set textinputformat.record.delimiter='\0'
;

hive> insert into table songs values
(
1
,'All For Leyna'
,'She stood on the tracks
Waving her arms
Leading me to that third rail shock
Quick as a wink
She changed her mind'
)
;

hive> insert into table songs values
(
2
,'Goodnight Saigon'
,'We met as soul mates
On Parris Island
We left as inmates
From an asylum
And we were sharp
As sharp as knives
And we were so gung ho
To lay down our lives'
)
;

hive> select id,name,length(lyrics) from songs;

1  All For Leyna  114
2  Goodnight Saigon155


hive> select id,name,hex(lyrics) from songs;

1  All For Leyna
5368652073746F6F64206F6E2074686520747261636B730A576176696E67206865722061726D730A4C656164696E67206D6520746F2074686174207468697264207261696C2073686F636B0A517569636B20617320612077696E6B0A536865206368616E67656420686572206D696E640A0A
2  Goodnight Saigon
5765206D657420617320736F756C206D617465730A4F6E205061727269732049736C616E640A5765206C65667420617320696E6D617465730A46726F6D20616E206173796C756D0A416E6420776520776572652073686172700A4173207368617270206173206B6E697665730A416E64207765207765726520736F2067756E6720686F0A546F206C617920646F776E206F7572206C697665730A0A

hive&g

RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-09 Thread Markovitz, Dudu
Same issues.

Dudu

From: abhishek [mailto:ec.abhis...@gmail.com]
Sent: Thursday, June 09, 2016 9:23 AM
To: user@hive.apache.org
Subject: Re: LINES TERMINATED BY only supports newline '\n' right now


Did you try defining the table with hive In built SerDe. 'Stored as ORC'
This should resolve your issue. Plz try and let me know if it works.

Abhi
Sent from my iPhone

On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu 
mailto:dmarkov...@paypal.com>> wrote:
Here is an example, but first – some warnings:


· You should set textinputformat.record.delimiter not only for the 
populating of the table but also for querying it



· There seems to be many issues around this area –

o   When I tried to insert multiple values in a single statement (“insert into 
table … values (…),(…),(…)”) only the first set of values was inserted correctly

o   2 new lines were added to the end of each text (‘lyrics’) although there 
should be none.

o   Aggregative queries seems to return null for the last column. Sometimes.

o   The function ‘sentences’ does not work as expected. It treated the whole 
text as a single line.



Dudu





Example


hive> create table songs (id int,name string,lyrics string)
;

hive> set textinputformat.record.delimiter='\0'
;

hive> insert into table songs values
(
1
,'All For Leyna'
,'She stood on the tracks
Waving her arms
Leading me to that third rail shock
Quick as a wink
She changed her mind'
)
;

hive> insert into table songs values
(
2
,'Goodnight Saigon'
,'We met as soul mates
On Parris Island
We left as inmates
From an asylum
And we were sharp
As sharp as knives
And we were so gung ho
To lay down our lives'
)
;

hive> select id,name,length(lyrics) from songs;

1  All For Leyna  114
2  Goodnight Saigon155


hive> select id,name,hex(lyrics) from songs;

1  All For Leyna
5368652073746F6F64206F6E2074686520747261636B730A576176696E67206865722061726D730A4C656164696E67206D6520746F2074686174207468697264207261696C2073686F636B0A517569636B20617320612077696E6B0A536865206368616E67656420686572206D696E640A0A
2  Goodnight Saigon
5765206D657420617320736F756C206D617465730A4F6E205061727269732049736C616E640A5765206C65667420617320696E6D617465730A46726F6D20616E206173796C756D0A416E6420776520776572652073686172700A4173207368617270206173206B6E697665730A416E64207765207765726520736F2067756E6720686F0A546F206C617920646F776E206F7572206C697665730A0A

hive> select id,name,regexp_replace(lyrics,'\n','<<>>') from songs;

1  All For Leyna  She stood on the tracks<<>>Waving 
her arms<<>>Leading me to that third rail shock<<>>Quick as a 
wink<<>>She changed her mind<<>><<>>
2  Goodnight SaigonWe met as soul mates<<>>On 
Parris Island<<>>We left as inmates<<>>From an 
asylum<<>>And we were sharp<<>>As sharp as 
knives<<>>And we were so gung ho<<>>To lay down our 
lives<<>><<>>

hive> select id,name,split(lyrics,'\n') from songs;

1  All For Leyna  ["She stood on the tracks","Waving her 
arms","Leading me to that third rail shock","Quick as a wink","She changed her 
mind","",""]
2  Goodnight Saigon["We met as soul mates","On Parris 
Island","We left as inmates","From an asylum","And we were sharp","As sharp as 
knives","And we were so gung ho","To lay down our lives","",""]

hive> select id,name,sentences(lyrics) from songs;

1  All For Leyna
[["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]]
2  Goodnight Saigon
[["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]]

hiv

Re: LINES TERMINATED BY only supports newline '\n' right now

2016-06-08 Thread abhishek

Did you try defining the table with hive In built SerDe. 'Stored as ORC'
This should resolve your issue. Plz try and let me know if it works. 

Abhi 
Sent from my iPhone

> On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu  wrote:
> 
> Here is an example, but first – some warnings:
>  
> · You should set textinputformat.record.delimiter not only for the 
> populating of the table but also for querying it
>  
> · There seems to be many issues around this area –
> o   When I tried to insert multiple values in a single statement (“insert 
> into table … values (…),(…),(…)”) only the first set of values was inserted 
> correctly
> o   2 new lines were added to the end of each text (‘lyrics’) although there 
> should be none.
> o   Aggregative queries seems to return null for the last column. Sometimes.
> o   The function ‘sentences’ does not work as expected. It treated the whole 
> text as a single line.
>  
> Dudu
>  
>  
> Example
>  
> hive> create table songs (id int,name string,lyrics string)
> ;
>  
> hive> set textinputformat.record.delimiter='\0'
> ;
>  
> hive> insert into table songs values
> (
> 1
> ,'All For Leyna'
> ,'She stood on the tracks
> Waving her arms
> Leading me to that third rail shock
> Quick as a wink
> She changed her mind'
> )
> ;
>  
> hive> insert into table songs values
> (
> 2
> ,'Goodnight Saigon'
> ,'We met as soul mates
> On Parris Island
> We left as inmates
> From an asylum
> And we were sharp
> As sharp as knives
> And we were so gung ho
> To lay down our lives'
> )
> ;
>  
> hive> select id,name,length(lyrics) from songs;
>  
> 1  All For Leyna  114
> 2  Goodnight Saigon155
>  
>  
> hive> select id,name,hex(lyrics) from songs;
>  
> 1  All For Leyna
> 5368652073746F6F64206F6E2074686520747261636B730A576176696E67206865722061726D730A4C656164696E67206D6520746F2074686174207468697264207261696C2073686F636B0A517569636B20617320612077696E6B0A536865206368616E67656420686572206D696E640A0A
> 2  Goodnight Saigon
> 5765206D657420617320736F756C206D617465730A4F6E205061727269732049736C616E640A5765206C65667420617320696E6D617465730A46726F6D20616E206173796C756D0A416E6420776520776572652073686172700A4173207368617270206173206B6E697665730A416E64207765207765726520736F2067756E6720686F0A546F206C617920646F776E206F7572206C697665730A0A
>  
> hive> select id,name,regexp_replace(lyrics,'\n','<<>>') from songs;
>  
> 1  All For Leyna  She stood on the tracks<<>>Waving 
> her arms<<>>Leading me to that third rail shock<<>>Quick as 
> a wink<<>>She changed her mind<<>><<>>
> 2  Goodnight SaigonWe met as soul 
> mates<<>>On Parris Island<<>>We left as 
> inmates<<>>From an asylum<<>>And we were 
> sharp<<>>As sharp as knives<<>>And we were so gung 
> ho<<>>To lay down our lives<<>><<>>
>  
> hive> select id,name,split(lyrics,'\n') from songs;
>  
> 1  All For Leyna  ["She stood on the tracks","Waving her 
> arms","Leading me to that third rail shock","Quick as a wink","She changed 
> her mind","",""]
> 2  Goodnight Saigon["We met as soul mates","On Parris 
> Island","We left as inmates","From an asylum","And we were sharp","As sharp 
> as knives","And  we were so gung ho","To lay down our lives","",""]
>  
> hive> select id,name,sentences(lyrics) from songs;
>  
> 1  All For Leyna
> [["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]]
> 2  Goodnight Saigon
> [["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]]
>  
> hive> select count (*) from songs;
>  
> NULL
>  
> hive> select count (*),123,456,789 from songs;
>  
> 2  123 456 NULL
>  
> hive> select count (*),'A','B','C' from songs;
>  
> 2  A B C
>  
>  
> From: Radha krishna [mailto:grkmc...@gmail.com] 
> Sent: Thursday, June 02, 2016 12:42 PM
> To: user@hive.apache.org
> Subject: LINES TERMINATED BY only supports newline '\n' right now
>  
> For some of the columns '\n' character is there as part of value, i want to 
> create a hive table for this data i tried by creating the hive table with US 
> as the line separator but it showing the below message 
>  
> Ex:
> CREATE EXTERNAL TABLE IF NOT EXISTS emp (name String,id int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '28'
> LINES TERMINATED BY '31'
> LOCATION 'emp.txt file path';
> 
> FAILED: SemanticException 4:20 LINES TERMINATED BY only supports newline '\n' 
> right now. Error encountered near token ''31''
>  
> how can we create hive table with out removing the \n character as part of 
> column value ( requirement 

RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-03 Thread Markovitz, Dudu
Here is an example, but first – some warnings:


· You should set textinputformat.record.delimiter not only for the 
populating of the table but also for querying it



· There seems to be many issues around this area –

o   When I tried to insert multiple values in a single statement (“insert into 
table … values (…),(…),(…)”) only the first set of values was inserted correctly

o   2 new lines were added to the end of each text (‘lyrics’) although there 
should be none.

o   Aggregative queries seems to return null for the last column. Sometimes.

o   The function ‘sentences’ does not work as expected. It treated the whole 
text as a single line.



Dudu





Example


hive> create table songs (id int,name string,lyrics string)
;

hive> set textinputformat.record.delimiter='\0'
;

hive> insert into table songs values
(
1
,'All For Leyna'
,'She stood on the tracks
Waving her arms
Leading me to that third rail shock
Quick as a wink
She changed her mind'
)
;

hive> insert into table songs values
(
2
,'Goodnight Saigon'
,'We met as soul mates
On Parris Island
We left as inmates
From an asylum
And we were sharp
As sharp as knives
And we were so gung ho
To lay down our lives'
)
;

hive> select id,name,length(lyrics) from songs;

1  All For Leyna  114
2  Goodnight Saigon155


hive> select id,name,hex(lyrics) from songs;

1  All For Leyna
5368652073746F6F64206F6E2074686520747261636B730A576176696E67206865722061726D730A4C656164696E67206D6520746F2074686174207468697264207261696C2073686F636B0A517569636B20617320612077696E6B0A536865206368616E67656420686572206D696E640A0A
2  Goodnight Saigon
5765206D657420617320736F756C206D617465730A4F6E205061727269732049736C616E640A5765206C65667420617320696E6D617465730A46726F6D20616E206173796C756D0A416E6420776520776572652073686172700A4173207368617270206173206B6E697665730A416E64207765207765726520736F2067756E6720686F0A546F206C617920646F776E206F7572206C697665730A0A

hive> select id,name,regexp_replace(lyrics,'\n','<<>>') from songs;

1  All For Leyna  She stood on the tracks<<>>Waving 
her arms<<>>Leading me to that third rail shock<<>>Quick as a 
wink<<>>She changed her mind<<>><<>>
2  Goodnight SaigonWe met as soul mates<<>>On 
Parris Island<<>>We left as inmates<<>>From an 
asylum<<>>And we were sharp<<>>As sharp as 
knives<<>>And we were so gung ho<<>>To lay down our 
lives<<>><<>>

hive> select id,name,split(lyrics,'\n') from songs;

1  All For Leyna  ["She stood on the tracks","Waving her 
arms","Leading me to that third rail shock","Quick as a wink","She changed her 
mind","",""]
2  Goodnight Saigon["We met as soul mates","On Parris 
Island","We left as inmates","From an asylum","And we were sharp","As sharp as 
knives","And we were so gung ho","To lay down our lives","",""]

hive> select id,name,sentences(lyrics) from songs;

1  All For Leyna
[["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]]
2  Goodnight Saigon
[["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]]

hive> select count (*) from songs;

NULL

hive> select count (*),123,456,789 from songs;

2  123 456 NULL

hive> select count (*),'A','B','C' from songs;

2  A B C


From: Radha krishna [mailto:grkmc...@gmail.com]
Sent: Thursday, June 02, 2016 12:42 PM
To: user@hive.apache.org
Subject: LINES TERMINATED BY only supports newline '\n' right now

For some of the columns '\n' character is there as part of value, i want to 
create a hive table for this data i tried by creating the hive table with US as 
the line separator but it showing the below message

Ex:
CREATE EXTERNAL TABLE IF NOT EXISTS emp (name String,id int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '28'
LINES TERMINATED BY '31'
LOCATION 'emp.txt file path';

FAILED: SemanticException 4:20 LINES TERMINATED BY only supports newline '\n' 
right now. Error encountered near token ''31''

how can we create hive table with out removing the \n character as part of 
column value ( requirement data need to maintain as it is)

can any one implemented hive tables with line separated other than \n

Thanks & Regards
   Radha krishna



RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-02 Thread Loudongfeng
One quick way to solve this is setting textinputformat.record.delimiter=”US”  
before reading from that text table.
But it has some limitations:
1,you cannot read multiple text tables with different line  delimiters in the 
same query.
2,you cannot write to that text table with your own line  delimiter.

Also ,there is patch at https://issues.apache.org/jira/browse/HIVE-5999  Allow 
other characters for LINES TERMINATED BY .
Which may meet your requirements.

From: Peter Vary [mailto:pv...@cloudera.com]
Sent: Friday, June 03, 2016 1:29 AM
To: user@hive.apache.org
Subject: Re: LINES TERMINATED BY only supports newline '\n' right now


Hi,

According to the documentation you should write and set your own Inputformat 
when creating the table.

Mike Sukmanowsky solved a similar problem here, this might help you:
http://stackoverflow.com/questions/7692994/custom-inputformat-with-hive

Regards,
Peter
For some of the columns '\n' character is there as part of value, i want to 
create a hive table for this data i tried by creating the hive table with US as 
the line separator but it showing the below message

Ex:
CREATE EXTERNAL TABLE IF NOT EXISTS emp (name String,id int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '28'
LINES TERMINATED BY '31'
LOCATION 'emp.txt file path';

FAILED: SemanticException 4:20 LINES TERMINATED BY only supports newline '\n' 
right now. Error encountered near token ''31''

how can we create hive table with out removing the \n character as part of 
column value ( requirement data need to maintain as it is)

can any one implemented hive tables with line separated other than \n

Thanks & Regards
   Radha krishna



Re: LINES TERMINATED BY only supports newline '\n' right now

2016-06-02 Thread Peter Vary
Hi,

According to the documentation you should write and set your own
Inputformat when creating the table.

Mike Sukmanowsky solved a similar problem here, this might help you:
http://stackoverflow.com/questions/7692994/custom-inputformat-with-hive

Regards,
Peter
For some of the columns '\n' character is there as part of value, i want to
create a hive table for this data i tried by creating the hive table with
US as the line separator but it showing the below message

Ex:
CREATE EXTERNAL TABLE IF NOT EXISTS emp (name String,id int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '28'
LINES TERMINATED BY '31'
LOCATION 'emp.txt file path';

FAILED: SemanticException 4:20 LINES TERMINATED BY only supports newline
'\n' right now. Error encountered near token ''31''

how can we create hive table with out removing the \n character as part of
column value ( requirement data need to maintain as it is)

can any one implemented hive tables with line separated other than \n

Thanks & Regards
   Radha krishna