Re: Store DStreams into Hive using Hive Streaming
I'm also interested in this feature. Did you guys found some information about how to use Hive Streaming with Spark Streaming? Thanks, Krzysiek 2015-07-17 20:16 GMT+02:00 unk1102 <umesh.ka...@gmail.com>: > Hi I have similar use case did you found solution for this problem of > loading > DStreams in Hive using Spark Streaming. Please guide. Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p23885.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Store DStreams into Hive using Hive Streaming
Hive is not designed for OLTP workloads like data insertions and updates you want to do with Spark Streaming. Hive is mainly for OLAP workloads where you already have data and you want to run bulk queries on the data. Other systems like HBase and Cassandra are more designed for OLTP. Please think about your system architecture based on how each of these are designed. On Mon, Oct 5, 2015 at 3:07 AM, Umesh Kacha <umesh.ka...@gmail.com> wrote: > Hi no didn't find any solution still I need that feature of hive streaming > using Spark please let me know if you get something. Alternative solution > is to use storm for hive processing. I would like to stick to Spark so > still searching. > On Oct 5, 2015 2:51 PM, "Krzysztof Zarzycki" <k.zarzy...@gmail.com> wrote: > >> I'm also interested in this feature. Did you guys found some information >> about how to use Hive Streaming with Spark Streaming? >> >> Thanks, >> Krzysiek >> >> 2015-07-17 20:16 GMT+02:00 unk1102 <umesh.ka...@gmail.com>: >> >>> Hi I have similar use case did you found solution for this problem of >>> loading >>> DStreams in Hive using Spark Streaming. Please guide. Thanks. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p23885.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >>
Re: Store DStreams into Hive using Hive Streaming
Hi no didn't find any solution still I need that feature of hive streaming using Spark please let me know if you get something. Alternative solution is to use storm for hive processing. I would like to stick to Spark so still searching. On Oct 5, 2015 2:51 PM, "Krzysztof Zarzycki" <k.zarzy...@gmail.com> wrote: > I'm also interested in this feature. Did you guys found some information > about how to use Hive Streaming with Spark Streaming? > > Thanks, > Krzysiek > > 2015-07-17 20:16 GMT+02:00 unk1102 <umesh.ka...@gmail.com>: > >> Hi I have similar use case did you found solution for this problem of >> loading >> DStreams in Hive using Spark Streaming. Please guide. Thanks. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p23885.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: Store DStreams into Hive using Hive Streaming
Hi I have similar use case did you found solution for this problem of loading DStreams in Hive using Spark Streaming. Please guide. Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p23885.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Store DStreams into Hive using Hive Streaming
please if you have found a solution for this , could you please post it ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p21877.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Store DStreams into Hive using Hive Streaming
Hi Ted and Silvio, thanks for your responses. Hive has a new API for streaming ( https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest) that takes care of compaction and doesn't require any downtime for the table. The data is immediately available and Hive will combine files in background transparently. I was hoping to use this API from within Spark to mitigate the issue with lots of small files... Here's my equivalent code for Trident (work in progress): https://gist.github.com/lgvier/ee28f1c95ac4f60efc3e Trident will coordinate the transaction and send all the tuples from each server/partition to your component at once (Stream.partitionPersist). That is very helpful since Hive expects batches of records instead of one call for each record. I had a look at foreachRDD but it seems to be invoked for each record. I'd like to get all the Stream's records on each server/partition at once. For example, if the stream was processed by 3 servers and resulted in 100 records on each server, I'd like to receive 3 calls (one on each server), each with 100 records. Please let me know if I'm making any sense. I'm fairly new to Spark. Thank you, -Geovani -Geovani On Thu, Nov 6, 2014 at 9:54 PM, Silvio Fiorito silvio.fior...@granturing.com wrote: Geovani, You can use HiveContext to do inserts into a Hive table in a Streaming app just as you would a batch app. A DStream is really a collection of RDDs so you can run the insert from within the foreachRDD. You just have to be careful that you’re not creating large amounts of small files. So you may want to either increase the duration of your Streaming batches or repartition right before you insert. You’ll just need to do some testing based on your ingest volume. You may also want to consider streaming into another data store though. Thanks, Silvio From: Luiz Geovani Vier lgv...@gmail.com Date: Thursday, November 6, 2014 at 7:46 PM To: user@spark.apache.org user@spark.apache.org Subject: Store DStreams into Hive using Hive Streaming Hello, Is there a built-in way or connector to store DStream results into an existing Hive ORC table using the Hive/HCatalog Streaming API? Otherwise, do you have any suggestions regarding the implementation of such component? Thank you, -Geovani
Store DStreams into Hive using Hive Streaming
Hello, Is there a built-in way or connector to store DStream results into an existing Hive ORC table using the Hive/HCatalog Streaming API? Otherwise, do you have any suggestions regarding the implementation of such component? Thank you, -Geovani
Re: Store DStreams into Hive using Hive Streaming
Ted, any pointers? On Thu, Nov 6, 2014 at 4:46 PM, Luiz Geovani Vier lgv...@gmail.com wrote: Hello, Is there a built-in way or connector to store DStream results into an existing Hive ORC table using the Hive/HCatalog Streaming API? Otherwise, do you have any suggestions regarding the implementation of such component? Thank you, -Geovani
Re: Store DStreams into Hive using Hive Streaming
Geovani, You can use HiveContext to do inserts into a Hive table in a Streaming app just as you would a batch app. A DStream is really a collection of RDDs so you can run the insert from within the foreachRDD. You just have to be careful that you’re not creating large amounts of small files. So you may want to either increase the duration of your Streaming batches or repartition right before you insert. You’ll just need to do some testing based on your ingest volume. You may also want to consider streaming into another data store though. Thanks, Silvio From: Luiz Geovani Vier lgv...@gmail.commailto:lgv...@gmail.com Date: Thursday, November 6, 2014 at 7:46 PM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Store DStreams into Hive using Hive Streaming Hello, Is there a built-in way or connector to store DStream results into an existing Hive ORC table using the Hive/HCatalog Streaming API? Otherwise, do you have any suggestions regarding the implementation of such component? Thank you, -Geovani