RE: Get 100 items in Comma Separated strings from Hive Column.

2016-06-09 Thread Markovitz, Dudu
+ bug fix
This version will differentiate between empty strings and strings with a single 
token (both have no commas)

hive> select 
RowID,length(regexp_replace(stringColumn,'[^,]',''))+if(length(stringColumn)=0,0,1)
 as count from t;


From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Friday, June 10, 2016 9:34 AM
To: user@hive.apache.org
Subject: RE: Get 100 items in Comma Separated strings from Hive Column.

+ Improvement

The “Count” can be done in a cleaner way
(The previous way works also with simple ‘replace’)

hive> select RowID,length(regexp_replace(stringColumn,'[^,]',''))+1 as count 
from t;

1  2
2  5
3  24
4  17
5  8
6  11
7  26
8  18
9  9


From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Thursday, June 09, 2016 11:30 PM
To: user@hive.apache.org
Subject: RE: Get 100 items in Comma Separated strings from Hive Column.


--  bash

mkdir t

cat>t/data.txt
1|44,85
2|56,37,83,68,43
3|33,48,42,18,23,80,31,86,48,42,37,52,99,55,93,1,63,67,32,75,44,57,70,2
4|77,26,95,53,11,99,74,82,7,55,75,6,32,87,75,99,80
5|48,78,39,62,16,44,43,63
6|35,97,99,19,22,50,29,84,82,25,77
7|80,43,82,94,81,58,70,8,70,6,62,100,60,84,55,24,100,75,84,15,53,5,19,45,61,73
8|66,44,66,4,80,72,81,63,51,24,51,77,87,85,10,36,43,2
9|39,64,29,14,9,42,66,56,33

hdfs dfs -put t /tmp


--  hive


create external table t
(
RowID   int
   ,stringColumnstring
)
row format delimited
fields terminated by '|'
location '/tmp/t'
;

select RowID,regexp_extract (stringColumn,'(,?[^,]*){0,10}',0) as 
string10,length(stringColumn)-length(regexp_replace(stringColumn,',',''))+1 as 
count from t;

144,85 2
256,37,83,68,43  5
333,48,42,18,23,80,31,86,48,42   24
477,26,95,53,11,99,74,82,7,5517
548,78,39,62,16,44,43,638
635,97,99,19,22,50,29,84,82,25   11
780,43,82,94,81,58,70,8,70,6 26
866,44,66,4,80,72,81,63,51,2418
939,64,29,14,9,42,66,56,33  9


Extracting the first 100 (10 in my example) tokens can be done with 
regexp_extract or regexp_replace

hive> select regexp_extract 
('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15','(,?[^,]*){0,10}',0);

1,2,3,4,5,6,7,8,9,10

hive> select regexp_replace 
('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15','((,?[^,]*){0,10}).*','$1');

1,2,3,4,5,6,7,8,9,10


From: Mahender Sarangam [mailto:mahender.bigd...@outlook.com]
Sent: Thursday, June 09, 2016 7:13 PM
To: user@hive.apache.org
Subject: Get 100 items in Comma Separated strings from Hive Column.


Hi,

We have hive table which has a single column with more than 1000 comma 
separated string items.  Is there a way to retrieve only 100 string items from 
that Column. Also we need to capture number of comma separated string items. We 
are looking for more of   "substring_index" functionality, since we are using 
Hive 1.2 version, we couldn't find "substring_index" UDF function, Is there a 
way to achieve the same functionality with  "regexp_extract" and I also see 
there is UDF available not sure whether this helps us achieving same 
functionality. 
https://github.com/brndnmtthws/facebook-hive-udfs/blob/master/src/main/java/com/facebook/hive/udf/UDFRegexpExtractAll.java

Scenario : Table1 (Source Table)

RowID stringColumn

1 1,2,3,4...1

2 2,4,5,8,4

3 10,11,98,100

Now i Would like to show table result structure like below

Row ID 100String count

1 1,2,3...100 1

2 2,4,5,8,4 5


RE: Get 100 items in Comma Separated strings from Hive Column.

2016-06-09 Thread Markovitz, Dudu
+ Improvement

The “Count” can be done in a cleaner way
(The previous way works also with simple ‘replace’)

hive> select RowID,length(regexp_replace(stringColumn,'[^,]',''))+1 as count 
from t;

1  2
2  5
3  24
4  17
5  8
6  11
7  26
8  18
9  9


From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Thursday, June 09, 2016 11:30 PM
To: user@hive.apache.org
Subject: RE: Get 100 items in Comma Separated strings from Hive Column.


--  bash

mkdir t

cat>t/data.txt
1|44,85
2|56,37,83,68,43
3|33,48,42,18,23,80,31,86,48,42,37,52,99,55,93,1,63,67,32,75,44,57,70,2
4|77,26,95,53,11,99,74,82,7,55,75,6,32,87,75,99,80
5|48,78,39,62,16,44,43,63
6|35,97,99,19,22,50,29,84,82,25,77
7|80,43,82,94,81,58,70,8,70,6,62,100,60,84,55,24,100,75,84,15,53,5,19,45,61,73
8|66,44,66,4,80,72,81,63,51,24,51,77,87,85,10,36,43,2
9|39,64,29,14,9,42,66,56,33

hdfs dfs -put t /tmp


--  hive


create external table t
(
RowID   int
   ,stringColumnstring
)
row format delimited
fields terminated by '|'
location '/tmp/t'
;

select RowID,regexp_extract (stringColumn,'(,?[^,]*){0,10}',0) as 
string10,length(stringColumn)-length(regexp_replace(stringColumn,',',''))+1 as 
count from t;

144,85 2
256,37,83,68,43  5
333,48,42,18,23,80,31,86,48,42   24
477,26,95,53,11,99,74,82,7,5517
548,78,39,62,16,44,43,638
635,97,99,19,22,50,29,84,82,25   11
780,43,82,94,81,58,70,8,70,6 26
866,44,66,4,80,72,81,63,51,2418
939,64,29,14,9,42,66,56,33  9


Extracting the first 100 (10 in my example) tokens can be done with 
regexp_extract or regexp_replace

hive> select regexp_extract 
('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15','(,?[^,]*){0,10}',0);

1,2,3,4,5,6,7,8,9,10

hive> select regexp_replace 
('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15','((,?[^,]*){0,10}).*','$1');

1,2,3,4,5,6,7,8,9,10


From: Mahender Sarangam [mailto:mahender.bigd...@outlook.com]
Sent: Thursday, June 09, 2016 7:13 PM
To: user@hive.apache.org
Subject: Get 100 items in Comma Separated strings from Hive Column.


Hi,

We have hive table which has a single column with more than 1000 comma 
separated string items.  Is there a way to retrieve only 100 string items from 
that Column. Also we need to capture number of comma separated string items. We 
are looking for more of   "substring_index" functionality, since we are using 
Hive 1.2 version, we couldn't find "substring_index" UDF function, Is there a 
way to achieve the same functionality with  "regexp_extract" and I also see 
there is UDF available not sure whether this helps us achieving same 
functionality. 
https://github.com/brndnmtthws/facebook-hive-udfs/blob/master/src/main/java/com/facebook/hive/udf/UDFRegexpExtractAll.java

Scenario : Table1 (Source Table)

RowID stringColumn

1 1,2,3,4...1

2 2,4,5,8,4

3 10,11,98,100

Now i Would like to show table result structure like below

Row ID 100String count

1 1,2,3...100 1

2 2,4,5,8,4 5


Re: Where are jars stored for permanent functions

2016-06-09 Thread dhruv kapatel
In order to add it permanently recommended ways are as follows.

   1.

   add in hive-site.xml

hive.aux.jars.path
   file://localpath/yourjar.jar 
   2.

   Copy and paste the JAR file to the ${HIVE_HOME}/auxlib/ folder (create
   it if it does not exist).

   Source: Apache hive essentials book


On 10 June 2016 at 02:20, Jason Dere  wrote:

> Hive doesn't currently handle storing of the JARs. Doing ADD JAR only
> adds the jars to the current session, but won't help for other sessions.
>
> The permanent functions syntax allows you to specify JAR paths when you
> create the function. These should be on HDFS or other non-local path.
>
>
> create function ST_GeomFromWKT as 'com.esri.hadoop.hive.ST_GeomFromWKT'
> using jar 'hdfs:/path/to/spatial-sdk-hive-1.1.jar', jar 'hdfs:/path/to/
> esri-geometry-api-1.2.1.jar';
>
>
> --
> *From:* Marcin Tustin 
> *Sent:* Wednesday, June 08, 2016 1:10 PM
> *To:* user@hive.apache.org
> *Subject:* Where are jars stored for permanent functions
>
> Hi All,
>
> I just added local jars to my hive session, created permanent functions,
> and find that they are available across sessions and machines. This is of
> course excellent, but I'm wondering where those jars are being stored? What
> setting or what default directory would I find them in.
>
> My session was:
>
> add jars /mnt/storage/spatial-sdk-hive-1.1.jar
> /mnt/storage/esri-geometry-api-1.2.1.jar;
>
> create function ST_GeomFromWKT as 'com.esri.hadoop.hive.ST_GeomFromWKT';
>
>
> Then that function was available via the thriftserver.
>
>
> Thanks,
>
> Marcin
>
> Want to work at Handy? Check out our culture deck and open roles
> 
> Latest news  at Handy
> Handy just raised $50m
> 
>  led
> by Fidelity
>
>


-- 


*With Regards:Kapatel Dhruv vmobile:+919909214243*


  
  





Re: Where are jars stored for permanent functions

2016-06-09 Thread Jason Dere
Hive doesn't currently handle storing of the JARs. Doing ADD JAR only adds the 
jars to the current session, but won't help for other sessions.

The permanent functions syntax allows you to specify JAR paths when you create 
the function. These should be on HDFS or other non-local path.


create function ST_GeomFromWKT as 'com.esri.hadoop.hive.ST_GeomFromWKT' using 
jar 'hdfs:/path/to/spatial-sdk-hive-1.1.jar', jar 
'hdfs:/path/to/esri-geometry-api-1.2.1.jar';



From: Marcin Tustin 
Sent: Wednesday, June 08, 2016 1:10 PM
To: user@hive.apache.org
Subject: Where are jars stored for permanent functions

Hi All,

I just added local jars to my hive session, created permanent functions, and 
find that they are available across sessions and machines. This is of course 
excellent, but I'm wondering where those jars are being stored? What setting or 
what default directory would I find them in.

My session was:


add jars /mnt/storage/spatial-sdk-hive-1.1.jar 
/mnt/storage/esri-geometry-api-1.2.1.jar;

create function ST_GeomFromWKT as 'com.esri.hadoop.hive.ST_GeomFromWKT';


Then that function was available via the thriftserver.


Thanks,

Marcin

Want to work at Handy? Check out our culture deck and open 
roles
Latest news at Handy
Handy just raised 
$50m
 led by Fidelity

[http://marketing-email-assets.handybook.com/smalllogo.png]


RE: Get 100 items in Comma Separated strings from Hive Column.

2016-06-09 Thread Markovitz, Dudu

--  bash

mkdir t

cat>t/data.txt
1|44,85
2|56,37,83,68,43
3|33,48,42,18,23,80,31,86,48,42,37,52,99,55,93,1,63,67,32,75,44,57,70,2
4|77,26,95,53,11,99,74,82,7,55,75,6,32,87,75,99,80
5|48,78,39,62,16,44,43,63
6|35,97,99,19,22,50,29,84,82,25,77
7|80,43,82,94,81,58,70,8,70,6,62,100,60,84,55,24,100,75,84,15,53,5,19,45,61,73
8|66,44,66,4,80,72,81,63,51,24,51,77,87,85,10,36,43,2
9|39,64,29,14,9,42,66,56,33

hdfs dfs -put t /tmp


--  hive


create external table t
(
RowID   int
   ,stringColumnstring
)
row format delimited
fields terminated by '|'
location '/tmp/t'
;

select RowID,regexp_extract (stringColumn,'(,?[^,]*){0,10}',0) as 
string10,length(stringColumn)-length(regexp_replace(stringColumn,',',''))+1 as 
count from t;

144,85 2
256,37,83,68,43  5
333,48,42,18,23,80,31,86,48,42   24
477,26,95,53,11,99,74,82,7,5517
548,78,39,62,16,44,43,638
635,97,99,19,22,50,29,84,82,25   11
780,43,82,94,81,58,70,8,70,6 26
866,44,66,4,80,72,81,63,51,2418
939,64,29,14,9,42,66,56,33  9


Extracting the first 100 (10 in my example) tokens can be done with 
regexp_extract or regexp_replace

hive> select regexp_extract 
('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15','(,?[^,]*){0,10}',0);

1,2,3,4,5,6,7,8,9,10

hive> select regexp_replace 
('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15','((,?[^,]*){0,10}).*','$1');

1,2,3,4,5,6,7,8,9,10


From: Mahender Sarangam [mailto:mahender.bigd...@outlook.com]
Sent: Thursday, June 09, 2016 7:13 PM
To: user@hive.apache.org
Subject: Get 100 items in Comma Separated strings from Hive Column.


Hi,

We have hive table which has a single column with more than 1000 comma 
separated string items.  Is there a way to retrieve only 100 string items from 
that Column. Also we need to capture number of comma separated string items. We 
are looking for more of   "substring_index" functionality, since we are using 
Hive 1.2 version, we couldn't find "substring_index" UDF function, Is there a 
way to achieve the same functionality with  "regexp_extract" and I also see 
there is UDF available not sure whether this helps us achieving same 
functionality. 
https://github.com/brndnmtthws/facebook-hive-udfs/blob/master/src/main/java/com/facebook/hive/udf/UDFRegexpExtractAll.java

Scenario : Table1 (Source Table)

RowID stringColumn

1 1,2,3,4...1

2 2,4,5,8,4

3 10,11,98,100

Now i Would like to show table result structure like below

Row ID 100String count

1 1,2,3...100 1

2 2,4,5,8,4 5


Re: Hive Table Creation failure on Postgres

2016-06-09 Thread Mich Talebzadeh
Well I know that the script works fine for Oracle (both base and
transactional).

Ok this is what this table is in Oracle. That column is 256 bytes.

[image: Inline images 2]


HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 9 June 2016 at 19:43, Siddhi Mehta  wrote:

> Hello Everyone,
>
> We are using postgres for hive persistent store.
>
> We are making use of the schematool to create hive schema and our hive
> configs have table and column validation enabled.
>
> While trying to create a simple hive table we ran into the following error.
>
> Error: Error while processing statement: FAILED: Execution Error, return
> code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
> MetaException(message:javax.jdo.JDODataStoreException: Wrong precision
> for column "*COLUMNS_V2"."COMMENT*" : was 4000 (according to the JDBC
> driver) but should be 256 (based on the MetaData definition for field
> org.apache.hadoop.hive.metastore.model.MFieldSchema.comment).
>
> Looks like the Hive Metastore validation expects it to be 255 but when I
> looked at the metastore script for Postgres  it creates the column with
> precision 4000.
>
> Interesting thing is that mysql scripts for the same hive version create
> the column with precision 255.
>
> Is there a config to communicate with Hive MetaStore validation layers as
> to what is the appropriate column precision to be based on the underlying
> persistent store  used or
> is this a known workaround to turn of validation when using postgress as
> the persistent store.
>
> Thanks,
> Siddhi
>


Hive Table Creation failure on Postgres

2016-06-09 Thread Siddhi Mehta
Hello Everyone,

We are using postgres for hive persistent store.

We are making use of the schematool to create hive schema and our hive
configs have table and column validation enabled.

While trying to create a simple hive table we ran into the following error.

Error: Error while processing statement: FAILED: Execution Error, return
code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:javax.jdo.JDODataStoreException: Wrong precision for
column "*COLUMNS_V2"."COMMENT*" : was 4000 (according to the JDBC driver)
but should be 256 (based on the MetaData definition for field
org.apache.hadoop.hive.metastore.model.MFieldSchema.comment).

Looks like the Hive Metastore validation expects it to be 255 but when I
looked at the metastore script for Postgres  it creates the column with
precision 4000.

Interesting thing is that mysql scripts for the same hive version create
the column with precision 255.

Is there a config to communicate with Hive MetaStore validation layers as
to what is the appropriate column precision to be based on the underlying
persistent store  used or
is this a known workaround to turn of validation when using postgress as
the persistent store.

Thanks,
Siddhi


Get 100 items in Comma Separated strings from Hive Column.

2016-06-09 Thread Mahender Sarangam
Hi,

We have hive table which has a single column with more than 1000 comma 
separated string items.  Is there a way to retrieve only 100 string items from 
that Column. Also we need to capture number of comma separated string items. We 
are looking for more of   "substring_index" functionality, since we are using 
Hive 1.2 version, we couldn't find "substring_index" UDF function, Is there a 
way to achieve the same functionality with  "regexp_extract" and I also see 
there is UDF available not sure whether this helps us achieving same 
functionality. 
https://github.com/brndnmtthws/facebook-hive-udfs/blob/master/src/main/java/com/facebook/hive/udf/UDFRegexpExtractAll.java

Scenario : Table1 (Source Table)

RowID stringColumn

1 1,2,3,4...1

2 2,4,5,8,4

3 10,11,98,100

Now i Would like to show table result structure like below

Row ID 100String count

1 1,2,3...100 1

2 2,4,5,8,4 5


SMB join VS SMB Map Join?

2016-06-09 Thread Rohit Kumar Prusty
Hi All,
I have a basic questions on the Hive joins.

What is the core difference between SMB join and SMB Map Join?

Any pictorial representation will be a real help.

Regards
Rohit Kumar Prusty
+91-9884070075



Re: Delete hive partition while executing query.

2016-06-09 Thread Igor Kuzmenko
I've opened jira issue 

On Wed, Jun 8, 2016 at 9:01 PM, Eugene Koifman 
wrote:

> This looks like proof of a bug.  The reads locks 179730 and 179731 should
> have been blocked by 179729.
> As Alan said this won’t prevent the exception you are getting but it needs
> to be fixed to prevent a partition from disappearing while query 3 and 4
> are in progress.
>
> Could you file a Jira please?
>
> thanks,
> Eugene
>
> From: Igor Kuzmenko 
> Reply-To: "user@hive.apache.org" 
> Date: Wednesday, June 8, 2016 at 7:30 AM
> To: "user@hive.apache.org" 
> Subject: Re: Delete hive partition while executing query.
>
> Hi, thanks for reply, Alan.
> Here's one more test.
>
> Wed Jun 08 16:36:02 MSK 2016 Start thread 1
> Wed Jun 08 16:36:05 MSK 2016 Start thread 2
> Wed Jun 08 16:36:08 MSK 2016 Start thread 3
> Wed Jun 08 16:36:11 MSK 2016 Start thread 4
> Wed Jun 08 16:36:17 MSK 2016 Finish thread 1
> Wed Jun 08 16:36:17 MSK 2016 Thread 1 result: '344186'
> Wed Jun 08 16:36:17 MSK 2016 Thread 1 completed in 14443 ms
>
> Wed Jun 08 16:36:19 MSK 2016 Finished 2
> Wed Jun 08 16:36:19 MSK 2016 Thread 2 completed in 13967 ms
>
> Wed Jun 08 16:36:20 MSK 2016 Finish thread 3
> Wed Jun 08 16:36:20 MSK 2016 Thread 3 result: '344186'
> Wed Jun 08 16:36:20 MSK 2016 Thread 3 completed in 11737 ms
>
> java.sql.SQLException: Error while processing statement: FAILED: Execution
> Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
> Vertex failed, vertexName=Map 1, vertexId=vertex_1461923723503_0931_1_00,
> diagnostics=[Vertex vertex_1461923723503_0931_1_00 [Map 1] killed/failed
> due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections
> initializer failed, vertex=vertex_1461923723503_0931_1_00 [Map 1],
> java.lang.RuntimeException: serious problem
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)
>
>
>
> HiveMetaStore.log
> 
> HiveServer2.log
> 
>
> I didn't find anything intresing in metastore log, but HiveServer2 log
> contains this:
>
> Line 1023: 2016-06-08 16:36:04,456 INFO  [HiveServer2-Background-Pool:
> Thread-42]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
> Requesting:
> queryId=hive_20160608163602_542056d9-c524-4df4-af18-6aa5e906284f
> LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE,
> dbname:default, tablename:mobile_connections),
> LockComponent(type:SHARED_READ, level:PARTITION, dbname:default,
> tablename:mobile_connections,
> partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,
> hostname:mercury)
> Line 1043: 2016-06-08 16:36:04,546 INFO  [HiveServer2-Background-Pool:
> Thread-42]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response
> to queryId=hive_20160608163602_542056d9-c524-4df4-af18-6aa5e906284f
> LockResponse(lockid:179728, state:ACQUIRED)
> Line 1349: 2016-06-08 16:36:05,214 INFO  [HiveServer2-Background-Pool:
> Thread-50]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
> Requesting:
> queryId=hive_20160608163604_832abbff-6199-497e-b969-fd8ac1465abc
> LockRequest(component:[LockComponent(type:EXCLUSIVE, level:PARTITION,
> dbname:default, tablename:mobile_connections,
> partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,
> hostname:mercury)
> Line 1390: 2016-06-08 16:36:05,270 INFO  [HiveServer2-Background-Pool:
> Thread-50]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response
> to queryId=hive_20160608163604_832abbff-6199-497e-b969-fd8ac1465abc
> LockResponse(lockid:179729, state:WAITING)
> Line 2346: 2016-06-08 16:36:08,028 INFO  [HiveServer2-Background-Pool:
> Thread-68]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
> Requesting:
> queryId=hive_20160608163607_7b18da12-6f86-41c9-b4b1-be45252c18c2
> LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE,
> dbname:default, tablename:mobile_connections),
> LockComponent(type:SHARED_READ, level:PARTITION, dbname:default,
> tablename:mobile_connections,
> partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,
> hostname:mercury)
> Line 2370: 2016-06-08 16:36:08,069 INFO  [HiveServer2-Background-Pool:
> Thread-68]: lockmgr.DbLockManager (DbLockManager.java:lock(101)) - Response
> to queryId=hive_20160608163607_7b18da12-6f86-41c9-b4b1-be45252c18c2
> LockResponse(lockid:179730, state:ACQUIRED)
> Line 3561: 2016-06-08 16:36:11,000 INFO  [HiveServer2-Background-Pool:
> Thread-91]: lockmgr.DbLockManager (DbLockManager.java:lock(98)) -
> Requesting:
> queryId=hive_20160608163610_b78a201b-ae6d-4040-9115-f92118d5b629
> LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE,
> dbname:default, tablename:mobile_connections),
> LockComponent(type:SHARED_READ, level:PARTITION, dbname:default,
> tablename:mobile_connections,
> partitionname:dt=20151123/msisdn_last_digit=3)], txnid:0, user:hdfs,

RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-09 Thread Markovitz, Dudu
I’ve checked “sentences” source code.
It turns out it is using BreakIterator.getSentenceInstance to break the text to 
sentences.
Apparently ‘\n’ is not considered as a sentence separator nor ‘.’, but ‘?’ and 
‘!’ does.

Dudu


hive> select id,name,sentences(regexp_replace (lyrics,'\n','?')) from songs;

1  All For Leyna
[["She","stood","on","the","tracks"],["Waving","her","arms"],["Leading","me","to","that","third","rail","shock"],["Quick","as","a","wink"],["She","changed","her","mind"]]
2  Goodnight Saigon
[["We","met","as","soul","mates"],["On","Parris","Island"],["We","left","as","inmates"],["From","an","asylum"],["And","we","were","sharp"],["As","sharp","as","knives"],["And","we","were","so","gung","ho"],["To","lay","down","our","lives"]]

From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Thursday, June 09, 2016 10:58 AM
To: user@hive.apache.org
Subject: RE: LINES TERMINATED BY only supports newline '\n' right now

Partial success after few more trials and errors –

1.
“insert into … values (),(),…,()” doesn’t work right in any case
“insert into … values (); insert into … values ();…;insert into … values();” 
works only with textinputformat.record.delimiter changed.
Insert into … select … union all select … works fine (no need to touch 
textinputformat.record.delimiter)

2.
No bugs around aggregative functions

3.
“sentences” still doesn’t work as expected.
We can see that “split” works correctly.

hive> select id,name,split(lyrics,'\n') from songs;

1  All For Leyna  ["She stood on the tracks","Waving her 
arms","Leading me to that third rail shock","Quick as a wink","She changed her 
mind"]
2  Goodnight Saigon["We met as soul mates","On Parris 
Island","We left as inmates","From an asylum","And we were sharp","As sharp as 
knives","And we were so gung ho","To lay down our lives"]

hive> select id,name,sentences(lyrics) from songs;

1  All For Leyna
[["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]]
2  Goodnight Saigon
[["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]]





From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Thursday, June 09, 2016 10:23 AM
To: user@hive.apache.org
Subject: RE: LINES TERMINATED BY only supports newline '\n' right now

Same issues.

Dudu

From: abhishek [mailto:ec.abhis...@gmail.com]
Sent: Thursday, June 09, 2016 9:23 AM
To: user@hive.apache.org
Subject: Re: LINES TERMINATED BY only supports newline '\n' right now


Did you try defining the table with hive In built SerDe. 'Stored as ORC'
This should resolve your issue. Plz try and let me know if it works.

Abhi
Sent from my iPhone

On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu 
mailto:dmarkov...@paypal.com>> wrote:
Here is an example, but first – some warnings:


· You should set textinputformat.record.delimiter not only for the 
populating of the table but also for querying it



· There seems to be many issues around this area –

o   When I tried to insert multiple values in a single statement (“insert into 
table … values (…),(…),(…)”) only the first set of values was inserted correctly

o   2 new lines were added to the end of each text (‘lyrics’) although there 
should be none.

o   Aggregative queries seems to return null for the last column. Sometimes.

o   The function ‘sentences’ does not work as expected. It treated the whole 
text as a single line.



Dudu





Example


hive> create table songs (id int,name string,lyrics string)
;

hive> set textinputformat.record.delimiter='\0'
;

hive> insert into table songs values
(
1
,'All For Leyna'
,'She stood on the tracks
Waving her arms
Leading me to that third rail shock
Quick as a wink
She changed her mind'
)
;

hive> insert into table songs values
(
2
,'Goodnight Saigon'
,'We met as soul mates
On Parris Island
We left as inmates
From an asylum
And we were sharp
As sharp as knives
And we were so gung ho
To lay down our lives'
)
;

hive> select id,name,length(lyrics) from songs;

1  All For Leyna  114
2  Goodnight Saigon155


hive> select id,name,hex(lyrics) from songs;

1  All For Leyna
5368652073746F6F64206F6E2074686520747261636B730A576176696E67206865722061726D730A4C656164696E67206D6520746F2074686174207468697264207261696C2073686F636B0A517569636B20617320612077696E6B0A536865206368616E67656420686572206D696E640A0A
2  Goodnight Saigon
5765206D657420617320736F756C206D617465730A4F6E205061727269732049736C616E640A5765206C65667420617320696E6D617465730A46726F6

Re: Using Hive table for twitter data

2016-06-09 Thread Gopal Vijayaraghavan

> Any reason why that table in Hive cannot read data in?

No idea how you're loading data with flume, but it isn't doing it right.

>> PARTITIONED BY (datehour INT)

...

>> -rw-r--r--   2 hduser supergroup 433868 2016-06-09 09:52
>>/twitter_data/FlumeData.1465462333430

No ideas on how to get that to create partitions either.

Cheers,
Gopal




Re: Using Hive table for twitter data

2016-06-09 Thread Mich Talebzadeh
thanks Gopal

that link

404 - OOPS!
Looks like you wandered too far from the herd!

LOL

Any reason why that table in Hive cannot read data in?

cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 9 June 2016 at 10:09, Gopal Vijayaraghavan  wrote:

>
> > Has anyone done recent load of twitter data into Hive table.
>
> Not anytime recently, but the twitter corpus was heavily used to demo Hive.
>
> Here's the original post on auto-learning schemas from an arbitrary
> collection of JSON docs (like a MongoDB dump).
>
> http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-
> documents/
>
>
> Cheers,
> Gopal
>
>
>


Re: Using Hive table for twitter data

2016-06-09 Thread Gopal Vijayaraghavan

> Has anyone done recent load of twitter data into Hive table.

Not anytime recently, but the twitter corpus was heavily used to demo Hive.

Here's the original post on auto-learning schemas from an arbitrary
collection of JSON docs (like a MongoDB dump).

http://hortonworks.com/blog/discovering-hive-schema-in-collections-of-json-
documents/


Cheers,
Gopal




Using Hive table for twitter data

2016-06-09 Thread Mich Talebzadeh
Hi,

I am just exploring this.

Has anyone done recent load of twitter data into Hive table.

I used few of them.

This one I tried

ADD JAR /home/hduser/jars/hive-serdes-1.0-SNAPSHOT.jar;
--SET hive.support.sql11.reserved.keywords=false;
use test;
drop table if exists tweets;
CREATE EXTERNAL TABLE tweets (
  id BIGINT,
  created_at STRING,
  source STRING,
  favorited BOOLEAN,
  retweeted_status STRUCT<
text:STRING,
user1:STRUCT,
retweet_count:INT>,
  entities STRUCT<
urls:ARRAY>,
user_mentions:ARRAY>,
hashtags:ARRAY>>,
  text STRING,
  user1 STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
  in_reply_to_screen_name STRING
)
PARTITIONED BY (datehour INT)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/twitter_data'
;

It creates OK but no data is there.

I use Flume to populate that external directory

hdfs dfs -ls /twitter_data
-rw-r--r--   2 hduser supergroup 433868 2016-06-09 09:52
/twitter_data/FlumeData.1465462333430
-rw-r--r--   2 hduser supergroup 438933 2016-06-09 09:53
/twitter_data/FlumeData.1465462365382
-rw-r--r--   2 hduser supergroup 559724 2016-06-09 09:53
/twitter_data/FlumeData.1465462403606
-rw-r--r--   2 hduser supergroup 455594 2016-06-09 09:54
/twitter_data/FlumeData.1465462435124

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


RE: Need Your Inputs For Below Scenario

2016-06-09 Thread Markovitz, Dudu
Explode + joins


--  bash


mkdir t1
mkdir t2

cat>t1/data.txt
A  B1 B2B4 B5 B6

cat>t2/data.txt
B1 D1
B2 D2
B3 D3
B4 D4
B5 D5
B6 D6

hdfs dfs -put t1 t2 /tmp


--  hive


create external table t1
(
Column1 string
   ,Column2 string
   ,Column3 string
   ,Column4 string
   ,Column5 string
   ,Column6 string
   ,Column7 string
)
row format delimited
fields terminated by '\t'
location '/tmp/t1'
;

create external table t2
(
Column1 string
   ,Column2 string
)
row format delimited
fields terminated by '\t'
location '/tmp/t2'
;

Theoretically I would have written the query like this -

select  t1.Column1
   ,t1_unpivot.val
   ,t2.Column2

fromt1

lateral viewexplode 
(array(Column2,Column3,Column4,Column5,Column6,Column7)) t1_unpivot as val

joint2

on  t2.Column1  =
t1_unpivot.val
;

Unfortunately, this syntax is not supported

FAILED: SemanticException [Error 10085]: Line 7:32 JOIN with a LATERAL VIEW is 
not supported 'val'


As a work-around I'm nesting the "lateral view'


select  t1.Column1
   ,t1.val
   ,t2.Column2

from   (select  t1.Column1
   ,t1_unpivot.val

fromt1

lateral viewexplode 
(array(Column2,Column3,Column4,Column5,Column6,Column7)) t1_unpivot as val
)
as t1

joint2

on  t2.Column1  =
t1.val
;

A  B1 D1
A  B2 D2
A  B4 D4
A  B5 D5
A  B6 D6

From: Lunagariya, Dhaval [mailto:dhaval.lunagar...@citi.com]
Sent: Wednesday, June 08, 2016 6:25 PM
To: 'user@hive.apache.org' 
Cc: 'er.dcpa...@gmail.com' 
Subject: RE: Need Your Inputs For Below Scenario

Here Table2 is very large table and contains lakhs of rows.

From: Lunagariya, Dhaval [CCC-OT]
Sent: Wednesday, June 08, 2016 5:52 PM
To: user@hive.apache.org
Subject: Need Your Inputs For Below Scenario

Hey folks,

Need your help.

Input Table1:

Column1

Column2

Column3

Column4

Column5

Column6

Column7

A

B1

B2

B3(NULL)

B4

B5

B6



Input Table2:
Column1

Column2

B1

D1

B2

D2

B3

D3

B4

D4

B5

D5

B6

D6



Output:
Column1

Column2

Column3

A

B1

D1

A

B2

D2

A

B4

D4

A

B5

D5

A

B6

D6


Here B3 is skipped because B3 is NULL.

What is the efficient way to get above result using Hive?



Regards,
Dhaval



RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-09 Thread Markovitz, Dudu
Partial success after few more trials and errors –

1.
“insert into … values (),(),…,()” doesn’t work right in any case
“insert into … values (); insert into … values ();…;insert into … values();” 
works only with textinputformat.record.delimiter changed.
Insert into … select … union all select … works fine (no need to touch 
textinputformat.record.delimiter)

2.
No bugs around aggregative functions

3.
“sentences” still doesn’t work as expected.
We can see that “split” works correctly.

hive> select id,name,split(lyrics,'\n') from songs;

1  All For Leyna  ["She stood on the tracks","Waving her 
arms","Leading me to that third rail shock","Quick as a wink","She changed her 
mind"]
2  Goodnight Saigon["We met as soul mates","On Parris 
Island","We left as inmates","From an asylum","And we were sharp","As sharp as 
knives","And we were so gung ho","To lay down our lives"]

hive> select id,name,sentences(lyrics) from songs;

1  All For Leyna
[["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]]
2  Goodnight Saigon
[["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]]





From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Thursday, June 09, 2016 10:23 AM
To: user@hive.apache.org
Subject: RE: LINES TERMINATED BY only supports newline '\n' right now

Same issues.

Dudu

From: abhishek [mailto:ec.abhis...@gmail.com]
Sent: Thursday, June 09, 2016 9:23 AM
To: user@hive.apache.org
Subject: Re: LINES TERMINATED BY only supports newline '\n' right now


Did you try defining the table with hive In built SerDe. 'Stored as ORC'
This should resolve your issue. Plz try and let me know if it works.

Abhi
Sent from my iPhone

On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu 
mailto:dmarkov...@paypal.com>> wrote:
Here is an example, but first – some warnings:


· You should set textinputformat.record.delimiter not only for the 
populating of the table but also for querying it



· There seems to be many issues around this area –

o   When I tried to insert multiple values in a single statement (“insert into 
table … values (…),(…),(…)”) only the first set of values was inserted correctly

o   2 new lines were added to the end of each text (‘lyrics’) although there 
should be none.

o   Aggregative queries seems to return null for the last column. Sometimes.

o   The function ‘sentences’ does not work as expected. It treated the whole 
text as a single line.



Dudu





Example


hive> create table songs (id int,name string,lyrics string)
;

hive> set textinputformat.record.delimiter='\0'
;

hive> insert into table songs values
(
1
,'All For Leyna'
,'She stood on the tracks
Waving her arms
Leading me to that third rail shock
Quick as a wink
She changed her mind'
)
;

hive> insert into table songs values
(
2
,'Goodnight Saigon'
,'We met as soul mates
On Parris Island
We left as inmates
From an asylum
And we were sharp
As sharp as knives
And we were so gung ho
To lay down our lives'
)
;

hive> select id,name,length(lyrics) from songs;

1  All For Leyna  114
2  Goodnight Saigon155


hive> select id,name,hex(lyrics) from songs;

1  All For Leyna
5368652073746F6F64206F6E2074686520747261636B730A576176696E67206865722061726D730A4C656164696E67206D6520746F2074686174207468697264207261696C2073686F636B0A517569636B20617320612077696E6B0A536865206368616E67656420686572206D696E640A0A
2  Goodnight Saigon
5765206D657420617320736F756C206D617465730A4F6E205061727269732049736C616E640A5765206C65667420617320696E6D617465730A46726F6D20616E206173796C756D0A416E6420776520776572652073686172700A4173207368617270206173206B6E697665730A416E64207765207765726520736F2067756E6720686F0A546F206C617920646F776E206F7572206C697665730A0A

hive> select id,name,regexp_replace(lyrics,'\n','<<>>') from songs;

1  All For Leyna  She stood on the tracks<<>>Waving 
her arms<<>>Leading me to that third rail shock<<>>Quick as a 
wink<<>>She changed her mind<<>><<>>
2  Goodnight SaigonWe met as soul mates<<>>On 
Parris Island<<>>We left as inmates<<>>From an 
asylum<<>>And we were sharp<<>>As sharp as 
knives<<>>And we were so gung ho<<>>To lay down our 
lives<<>><<>>

hive> select id,name,split(lyrics,'\n') from songs;

1  All For Leyna  ["She stood on the tracks","Waving her 
arms","Leading me to that third rail shock","Quick as a wink","She changed her 
mind","",""]
2  Goodnight Saigon["We met as soul mates","On Parris 
Island","We left as inmates","From an asylum","And we were sharp","As

RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-09 Thread Markovitz, Dudu
Same issues.

Dudu

From: abhishek [mailto:ec.abhis...@gmail.com]
Sent: Thursday, June 09, 2016 9:23 AM
To: user@hive.apache.org
Subject: Re: LINES TERMINATED BY only supports newline '\n' right now


Did you try defining the table with hive In built SerDe. 'Stored as ORC'
This should resolve your issue. Plz try and let me know if it works.

Abhi
Sent from my iPhone

On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu 
mailto:dmarkov...@paypal.com>> wrote:
Here is an example, but first – some warnings:


· You should set textinputformat.record.delimiter not only for the 
populating of the table but also for querying it



· There seems to be many issues around this area –

o   When I tried to insert multiple values in a single statement (“insert into 
table … values (…),(…),(…)”) only the first set of values was inserted correctly

o   2 new lines were added to the end of each text (‘lyrics’) although there 
should be none.

o   Aggregative queries seems to return null for the last column. Sometimes.

o   The function ‘sentences’ does not work as expected. It treated the whole 
text as a single line.



Dudu





Example


hive> create table songs (id int,name string,lyrics string)
;

hive> set textinputformat.record.delimiter='\0'
;

hive> insert into table songs values
(
1
,'All For Leyna'
,'She stood on the tracks
Waving her arms
Leading me to that third rail shock
Quick as a wink
She changed her mind'
)
;

hive> insert into table songs values
(
2
,'Goodnight Saigon'
,'We met as soul mates
On Parris Island
We left as inmates
From an asylum
And we were sharp
As sharp as knives
And we were so gung ho
To lay down our lives'
)
;

hive> select id,name,length(lyrics) from songs;

1  All For Leyna  114
2  Goodnight Saigon155


hive> select id,name,hex(lyrics) from songs;

1  All For Leyna
5368652073746F6F64206F6E2074686520747261636B730A576176696E67206865722061726D730A4C656164696E67206D6520746F2074686174207468697264207261696C2073686F636B0A517569636B20617320612077696E6B0A536865206368616E67656420686572206D696E640A0A
2  Goodnight Saigon
5765206D657420617320736F756C206D617465730A4F6E205061727269732049736C616E640A5765206C65667420617320696E6D617465730A46726F6D20616E206173796C756D0A416E6420776520776572652073686172700A4173207368617270206173206B6E697665730A416E64207765207765726520736F2067756E6720686F0A546F206C617920646F776E206F7572206C697665730A0A

hive> select id,name,regexp_replace(lyrics,'\n','<<>>') from songs;

1  All For Leyna  She stood on the tracks<<>>Waving 
her arms<<>>Leading me to that third rail shock<<>>Quick as a 
wink<<>>She changed her mind<<>><<>>
2  Goodnight SaigonWe met as soul mates<<>>On 
Parris Island<<>>We left as inmates<<>>From an 
asylum<<>>And we were sharp<<>>As sharp as 
knives<<>>And we were so gung ho<<>>To lay down our 
lives<<>><<>>

hive> select id,name,split(lyrics,'\n') from songs;

1  All For Leyna  ["She stood on the tracks","Waving her 
arms","Leading me to that third rail shock","Quick as a wink","She changed her 
mind","",""]
2  Goodnight Saigon["We met as soul mates","On Parris 
Island","We left as inmates","From an asylum","And we were sharp","As sharp as 
knives","And we were so gung ho","To lay down our lives","",""]

hive> select id,name,sentences(lyrics) from songs;

1  All For Leyna
[["She","stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink","She","changed","her","mind"]]
2  Goodnight Saigon
[["We","met","as","soul","mates","On","Parris","Island","We","left","as","inmates","From","an","asylum","And","we","were","sharp","As","sharp","as","knives","And","we","were","so","gung","ho","To","lay","down","our","lives"]]

hive> select count (*) from songs;

NULL

hive> select count (*),123,456,789 from songs;

2  123 456 NULL

hive> select count (*),'A','B','C' from songs;

2  A B C


From: Radha krishna [mailto:grkmc...@gmail.com]
Sent: Thursday, June 02, 2016 12:42 PM
To: user@hive.apache.org
Subject: LINES TERMINATED BY only supports newline '\n' right now

For some of the columns '\n' character is there as part of value, i want to 
create a hive table for this data i tried by creating the hive table with US as 
the line separator but it showing the below message

Ex:
CREATE EXTERNAL TABLE IF NOT EXISTS emp (name String,id int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '28'
LINES TERMINATED BY '31'
LOCATION 'emp.txt file path';

FAILED: SemanticException 4:20 LINES TERMINATED BY only supports newline '\n' 
right now. Error encountered near token ''31''

how can we create hive table with out removing the \n character as part of 
column value ( requirement data need to maintain a