subject:"\[Structured Streaming\] Using File Sink to store to hive table."

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-11 Thread Egor Pahomov

Got it, thanks! 2017-02-11 0:56 GMT-08:00 Sam Elamin : > Here's a link to the thread > > http://apache-spark-developers-list.1001551.n3.nabble.com/Structured- > Streaming-Dropping-Duplicates-td20884.html > > On Sat, 11 Feb 2017 at 08:47, Sam Elamin

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-11 Thread Sam Elamin

Here's a link to the thread http://apache-spark-developers-list.1001551.n3.nabble.com/Structured-Streaming-Dropping-Duplicates-td20884.html On Sat, 11 Feb 2017 at 08:47, Sam Elamin wrote: > Hey Egor > > > You can use for each writer or you can write a custom sink. I

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-11 Thread Sam Elamin

Hey Egor You can use for each writer or you can write a custom sink. I personally went with a custom sink since I get a dataframe per batch https://github.com/samelamin/spark-bigquery/blob/master/src/main/scala/com/samelamin/spark/bigquery/streaming/BigQuerySink.scala You can have a look at

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-10 Thread Jacek Laskowski

"Something like that" I've never tried it out myself so I'm only guessing having a brief look at the API. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat,

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-10 Thread Egor Pahomov

Jacek, so I create cache in ForeachWriter, in all "process()" I write to it and on close I flush? Something like that? 2017-02-09 12:42 GMT-08:00 Jacek Laskowski : > Hi, > > Yes, that's ForeachWriter. > > Yes, it works with element by element. You're looking for mapPartition >

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-09 Thread Jacek Laskowski

Hi, Yes, that's ForeachWriter. Yes, it works with element by element. You're looking for mapPartition and ForeachWriter has partitionId that you could use to implement a similar thing. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-08 Thread Egor Pahomov

Jacek, you mean http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.ForeachWriter ? I do not understand how to use it, since it passes every value separately, not every partition. And addding to table value by value would not work 2017-02-07 12:10 GMT-08:00 Jacek

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-07 Thread Jacek Laskowski

Hi, Have you considered foreach sink? Jacek On 6 Feb 2017 8:39 p.m., "Egor Pahomov" wrote: > Hi, I'm thinking of using Structured Streaming instead of old streaming, > but I need to be able to save results to Hive table. Documentation for file > sink

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-06 Thread Burak Yavuz

I presume you may be able to implement a custom sink and use df.saveAsTable. The problem is that you will have to handle idempotence / garbage collection yourself, in case your job fails while writing, etc. On Mon, Feb 6, 2017 at 5:53 PM, Egor Pahomov wrote: > I have

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-06 Thread Egor Pahomov

I have stream of files on HDFS with JSON events. I need to convert it to pq in realtime, process some fields and store in simple Hive table so people can query it. People even might want to query it with Impala, so it's important, that it would be real Hive metastore based table. How can I do

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-06 Thread Burak Yavuz

Hi Egor, Structured Streaming handles all of its metadata itself, which files are actually valid, etc. You may use the "create table" syntax in SQL to treat it like a hive table, but it will handle all partitioning information in its own metadata log. Is there a specific reason that you want to

[Structured Streaming] Using File Sink to store to hive table.

2017-02-06 Thread Egor Pahomov

Hi, I'm thinking of using Structured Streaming instead of old streaming, but I need to be able to save results to Hive table. Documentation for file sink says( http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks): "Supports writes to partitioned tables. ".

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: [Structured Streaming] Using File Sink to store to hive table.

Re: [Structured Streaming] Using File Sink to store to hive table.

[Structured Streaming] Using File Sink to store to hive table.

12 matches

Site Navigation

Mail list logo

Footer information