Re: [Structured Streaming] Using File Sink to store to hive table.

Egor Pahomov Mon, 06 Feb 2017 17:54:07 -0800

I have stream of files on HDFS with JSON events. I need to convert it to
pq in realtime, process some fields and store in simple Hive table so
people can query it. People even might want to query it with Impala, so
it's important, that it would be real Hive metastore based table. How can I
do that?


2017-02-06 14:25 GMT-08:00 Burak Yavuz <brk...@gmail.com>:

> Hi Egor,
>
> Structured Streaming handles all of its metadata itself, which files are
> actually valid, etc. You may use the "create table" syntax in SQL to treat
> it like a hive table, but it will handle all partitioning information in
> its own metadata log. Is there a specific reason that you want to store the
> information in the Hive Metastore?
>
> Best,
> Burak
>
> On Mon, Feb 6, 2017 at 11:39 AM, Egor Pahomov <pahomov.e...@gmail.com>
> wrote:
>
>> Hi, I'm thinking of using Structured Streaming instead of old streaming,
>> but I need to be able to save results to Hive table. Documentation for file
>> sink says(http://spark.apache.org/docs/latest/structured-streamin
>> g-programming-guide.html#output-sinks): "Supports writes to partitioned
>> tables. ". But being able to write to partitioned directories is not
>> enough to write to the table: someone needs to write to Hive metastore. How
>> can I use Structured Streaming and write to Hive table?
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>


-- 


*Sincerely yoursEgor Pakhomov*

Re: [Structured Streaming] Using File Sink to store to hive table.

Reply via email to