Re: Auto Refresh Hive Table Metadata

2018-08-10 Thread Gopal Vijayaraghavan


> By the way, if you want near-real-time tables with Hive, maybe you should 
> have a look at this project from Uber: https://uber.github.io/hudi/
> I don't know how mature it is yet, but I think it aims at solving that kind 
> of challenge.

Depending on your hive setup, you don't need a different backend to do 
near-real-time tables.

https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

Prasanth has a benchmark for Hive 3.x, which is limited by HDFS bandwidth at 
the moment with 64 threads.

https://github.com/prasanthj/culvert

$ ./culvert -u thrift://localhost:9183 -db testing -table culvert -p 64 -n 
10
Total rows committed: 9210
Throughput: 1535000 rows/second

Cheers,
Gopal




Re: Auto Refresh Hive Table Metadata

2018-08-10 Thread Furcy Pin
Hi Chintan,

Yes, this sounds weird...

"REFRESH TABLES" is the kind of statement required by SQL engines such as
Impala, Presto or Spark-SQL that cache metadata from the Metastore, but
vanilla Hive usually don't cache it and query the metastore every time
(unless some new feature was added recently, in which case it is probably
be possible to disable it with some option).
In other words, as long as you add new files to your existing partitions,
they should be automatically readable from Hive.
If you add new partitions, that's a different story, of course.

Are you sure you are using Hive here, and not Spark-SQL or something else?

By the way, if you want near-real-time tables with Hive, maybe you should
have a look at this project from Uber: https://uber.github.io/hudi/
I don't know how mature it is yet, but I think it aims at solving that kind
of challenge.

Regards,

Furcy

On Thu, 9 Aug 2018 at 18:30, Will Du  wrote:

> i never experienced such kind of issue. Once data is loaded to HDFS by
> sink, the data is available in hive.
>
> Sent from my iPhone
>
> On Aug 9, 2018, at 10:18, Chintan Patel  wrote:
>
> Hello Will Du,
>
> I'm using Kafka connector to create hive database. All the data are stored
> in s3 bucket and using mysql database for metastore.
>
> For example If connector add new records in hive table and If I run query
> It's not returning latest data and I have to run refresh table {table_name}
> to clear metastore cache. Now If I have 1000 hive table and I want to
> update those tables every 5 mins, running refresh query is not good idea I
> guess.
>
> So I was thinking if hive has some type of mechanism to do it in
> background then it will be good.
>
>
> On 9 August 2018 at 17:51, Will Du  wrote:
>
>> any reason to do this?
>>
>> Sent from my iPhone
>>
>> > On Aug 9, 2018, at 07:57, Chintan Patel  wrote:
>> >
>> > Hello,
>> >
>> > I want to refresh external type hive table metadata on some regular
>> interval without using "refresh table {table_name}".
>> >
>> > Thanks & Regards
>> >
>>
>
>


Re: Auto Refresh Hive Table Metadata

2018-08-09 Thread Will Du
i never experienced such kind of issue. Once data is loaded to HDFS by sink, 
the data is available in hive.

Sent from my iPhone

> On Aug 9, 2018, at 10:18, Chintan Patel  wrote:
> 
> Hello Will Du,
> 
> I'm using Kafka connector to create hive database. All the data are stored in 
> s3 bucket and using mysql database for metastore. 
> 
> For example If connector add new records in hive table and If I run query 
> It's not returning latest data and I have to run refresh table {table_name} 
> to clear metastore cache. Now If I have 1000 hive table and I want to update 
> those tables every 5 mins, running refresh query is not good idea I guess. 
> 
> So I was thinking if hive has some type of mechanism to do it in background 
> then it will be good. 
> 
> 
>> On 9 August 2018 at 17:51, Will Du  wrote:
>> any reason to do this?
>> 
>> Sent from my iPhone
>> 
>> > On Aug 9, 2018, at 07:57, Chintan Patel  wrote:
>> > 
>> > Hello,
>> > 
>> > I want to refresh external type hive table metadata on some regular 
>> > interval without using "refresh table {table_name}".
>> > 
>> > Thanks & Regards
>> > 
> 


Re: Auto Refresh Hive Table Metadata

2018-08-09 Thread Chintan Patel
Hello Will Du,

I'm using Kafka connector to create hive database. All the data are stored
in s3 bucket and using mysql database for metastore.

For example If connector add new records in hive table and If I run query
It's not returning latest data and I have to run refresh table {table_name}
to clear metastore cache. Now If I have 1000 hive table and I want to
update those tables every 5 mins, running refresh query is not good idea I
guess.

So I was thinking if hive has some type of mechanism to do it in background
then it will be good.


On 9 August 2018 at 17:51, Will Du  wrote:

> any reason to do this?
>
> Sent from my iPhone
>
> > On Aug 9, 2018, at 07:57, Chintan Patel  wrote:
> >
> > Hello,
> >
> > I want to refresh external type hive table metadata on some regular
> interval without using "refresh table {table_name}".
> >
> > Thanks & Regards
> >
>


Re: Auto Refresh Hive Table Metadata

2018-08-09 Thread Will Du
any reason to do this?

Sent from my iPhone

> On Aug 9, 2018, at 07:57, Chintan Patel  wrote:
> 
> Hello,
> 
> I want to refresh external type hive table metadata on some regular interval 
> without using "refresh table {table_name}".
> 
> Thanks & Regards
> 


Auto Refresh Hive Table Metadata

2018-08-09 Thread Chintan Patel
Hello,

I want to refresh external type hive table metadata on some regular
interval without using "refresh table {table_name}".

Thanks & Regards