Re: Can hive bear high throughput streaming data ingest?

2020-03-20 Thread wangl...@geekplus.com.cn

Hi Prasanth,

 I tried to run your test example but me errors and submt a issue: 
  https://github.com/prasanthj/culvert/issues/1
I am using Hive3.1.1 

Thanks,
Lei




wangl...@geekplus.com.cn
 
发件人: Prasanth Jayachandran
发送时间: 2020-03-20 15:41
收件人: user@hive.apache.org
主题: Re: Can hive bear high throughput streaming data ingest?
Use higher transaction batch size? Begin transaction opens a file, commit 
transaction writes intermediate footer but the file is kept open until the 
entire batch completes. So bigger batch size with less frequent commits can 
avoid creating too many small files in hdfs. Here is a test application for 
hive streaming v2 https://github.com/prasanthj/culvert/blob/v2/README.md that 
injected ~1.5 million rows/sec with 64 threads and 100K row commit interval in 
hdfs. https://github.com/prasanthj/culvert/blob/v2/report.txt

Thanks
Prasanth


From: wangl...@geekplus.com.cn 
Sent: Friday, March 20, 2020 12:30:07 AM
To: user 
Subject: Can hive bear high throughput streaming data ingest? 
 
https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2

I want to stream my app log to Hive using flume on the edge app server.
Since HDFS is not friendly to frequently write, I am afraid this way can not 
bear  high throuthput.

Any suggesions on this?

Thanks,
Lei



wangl...@geekplus.com.cn 



Can hive bear high throughput streaming data ingest?

2020-03-20 Thread wangl...@geekplus.com.cn
https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2

I want to stream my app log to Hive using flume on the edge app server.
Since HDFS is not friendly to frequently write, I am afraid this way can not 
bear  high throuthput.

Any suggesions on this?

Thanks,
Lei



wangl...@geekplus.com.cn 



回复: orc table could not execute insert statement

2020-03-18 Thread wangl...@geekplus.com.cn

Seems there's something wrong with my hive metastore.
Even not able to drop the table through hive client.
I drop the table and database through flink sql client, and then create 
database and table under hive client, works now.
But I don't know why. 

Thanks,
Lei 




wangl...@geekplus.com.cn 
 
发件人: wangl...@geekplus.com.cn
发送时间: 2020-03-19 11:12
收件人: user
主题: orc table could not execute insert statement

CREATE  TABLE `robot_tr`(`robot_id` int,  `robot_time` bigint, 
`linear_velocity` double, `track_side_error` int)  
partitioned by (warehouseid STRING)  stored as orc

Then insert into robot_tr (robot_id, robot_time, linear_velocity, 
track_side_error, warehouseid) VALUES (1,2, 3.4, 4, 'one').
It hangs there and no response.

Any insight on this?

Thanks,
Lei 



wangl...@geekplus.com.cn 



orc table could not execute insert statement

2020-03-18 Thread wangl...@geekplus.com.cn

CREATE  TABLE `robot_tr`(`robot_id` int,  `robot_time` bigint, 
`linear_velocity` double, `track_side_error` int)  
partitioned by (warehouseid STRING)  stored as orc

Then insert into robot_tr (robot_id, robot_time, linear_velocity, 
track_side_error, warehouseid) VALUES (1,2, 3.4, 4, 'one').
It hangs there and no response.

Any insight on this?

Thanks,
Lei 



wangl...@geekplus.com.cn 



partition column changed to null even specified

2020-03-17 Thread wangl...@geekplus.com.cn

First create a table partition by warehouseid:
CREATE  TABLE `robot`(`robot_id` int,  `robot_time` bigint, `linear_velocity` 
double, `track_side_error` int)
 partitioned by (warehouseid STRING)  
row format delimited fields terminated by '\t'  lines terminated by '\n' stored 
as textfile

Then load a file to this table: load data inpath '/user/root/wanglei/one' into 
table wanglei_test.robot partition (warehouseid = 'one');
A MapReduce job submitted, but after the job completed, it shows:  

Loading data to table wanglei_test.robot partition (warehouseid=null) 

But i have specified the warehouseid to one. 

When i select  the table, the warehouseid column is __HIVE_DEFAULT_PARTITION__

Thanks,
Lei



wangl...@geekplus.com.cn