Re: Store DStreams into Hive using Hive Streaming

2015-10-05 Thread Krzysztof Zarzycki
I'm also interested in this feature. Did you guys found some information
about how to use Hive Streaming with Spark Streaming?

Thanks,
Krzysiek

2015-07-17 20:16 GMT+02:00 unk1102 <umesh.ka...@gmail.com>:

> Hi I have similar use case did you found solution for this problem of
> loading
> DStreams in Hive using Spark Streaming. Please guide. Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p23885.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Store DStreams into Hive using Hive Streaming

2015-10-05 Thread Tathagata Das
Hive is not designed for OLTP workloads like data insertions and updates
you want to do with Spark Streaming. Hive is mainly for OLAP workloads
where you already have data and you want to run bulk queries on the data.
Other systems like HBase and Cassandra are more designed for OLTP. Please
think about your system architecture based on how each of these are
designed.

On Mon, Oct 5, 2015 at 3:07 AM, Umesh Kacha <umesh.ka...@gmail.com> wrote:

> Hi no didn't find any solution still I need that feature of hive streaming
> using Spark please let me know if you get something. Alternative solution
> is to use storm for hive processing. I would like to stick to Spark so
> still searching.
> On Oct 5, 2015 2:51 PM, "Krzysztof Zarzycki" <k.zarzy...@gmail.com> wrote:
>
>> I'm also interested in this feature. Did you guys found some information
>> about how to use Hive Streaming with Spark Streaming?
>>
>> Thanks,
>> Krzysiek
>>
>> 2015-07-17 20:16 GMT+02:00 unk1102 <umesh.ka...@gmail.com>:
>>
>>> Hi I have similar use case did you found solution for this problem of
>>> loading
>>> DStreams in Hive using Spark Streaming. Please guide. Thanks.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p23885.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>


Re: Store DStreams into Hive using Hive Streaming

2015-10-05 Thread Umesh Kacha
Hi no didn't find any solution still I need that feature of hive streaming
using Spark please let me know if you get something. Alternative solution
is to use storm for hive processing. I would like to stick to Spark so
still searching.
On Oct 5, 2015 2:51 PM, "Krzysztof Zarzycki" <k.zarzy...@gmail.com> wrote:

> I'm also interested in this feature. Did you guys found some information
> about how to use Hive Streaming with Spark Streaming?
>
> Thanks,
> Krzysiek
>
> 2015-07-17 20:16 GMT+02:00 unk1102 <umesh.ka...@gmail.com>:
>
>> Hi I have similar use case did you found solution for this problem of
>> loading
>> DStreams in Hive using Spark Streaming. Please guide. Thanks.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p23885.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Store DStreams into Hive using Hive Streaming

2015-07-17 Thread unk1102
Hi I have similar use case did you found solution for this problem of loading
DStreams in Hive using Spark Streaming. Please guide. Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p23885.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Store DStreams into Hive using Hive Streaming

2015-03-02 Thread tarek_abouzeid
please if you have found a solution for this , could you please post it ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Store-DStreams-into-Hive-using-Hive-Streaming-tp18307p21877.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Store DStreams into Hive using Hive Streaming

2014-11-07 Thread Luiz Geovani Vier
Hi Ted and Silvio, thanks for your responses.

Hive has a new API for streaming (
https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest)
that takes care of compaction and doesn't require any downtime for the
table. The data is immediately available and Hive will combine files in
background transparently. I was hoping to use this API from within Spark to
mitigate the issue with lots of small files...

Here's my equivalent code for Trident (work in progress):
https://gist.github.com/lgvier/ee28f1c95ac4f60efc3e
Trident will coordinate the transaction and send all the tuples from each
server/partition to your component at once (Stream.partitionPersist). That
is very helpful since Hive expects batches of records instead of one call
for each record.
I had a look at foreachRDD but it seems to be invoked for each record. I'd
like to get all the Stream's records on each server/partition at once.
For example, if the stream was processed by 3 servers and resulted in 100
records on each server, I'd like to receive 3 calls (one on each server),
each with 100 records. Please let me know if I'm making any sense. I'm
fairly new to Spark.

Thank you,
-Geovani


-Geovani

On Thu, Nov 6, 2014 at 9:54 PM, Silvio Fiorito 
silvio.fior...@granturing.com wrote:

  Geovani,

  You can use HiveContext to do inserts into a Hive table in a Streaming
 app just as you would a batch app. A DStream is really a collection of RDDs
 so you can run the insert from within the foreachRDD. You just have to be
 careful that you’re not creating large amounts of small files. So you may
 want to either increase the duration of your Streaming batches or
 repartition right before you insert. You’ll just need to do some testing
 based on your ingest volume. You may also want to consider streaming into
 another data store though.

  Thanks,
 Silvio

   From: Luiz Geovani Vier lgv...@gmail.com
 Date: Thursday, November 6, 2014 at 7:46 PM
 To: user@spark.apache.org user@spark.apache.org
 Subject: Store DStreams into Hive using Hive Streaming

   Hello,

 Is there a built-in way or connector to store DStream results into an
 existing Hive ORC table using the Hive/HCatalog Streaming API?
 Otherwise, do you have any suggestions regarding the implementation of
 such component?

 Thank you,
  -Geovani



Store DStreams into Hive using Hive Streaming

2014-11-06 Thread Luiz Geovani Vier
Hello,

Is there a built-in way or connector to store DStream results into an
existing Hive ORC table using the Hive/HCatalog Streaming API?
Otherwise, do you have any suggestions regarding the implementation of such
component?

Thank you,
-Geovani


Re: Store DStreams into Hive using Hive Streaming

2014-11-06 Thread Tathagata Das
Ted, any pointers?

On Thu, Nov 6, 2014 at 4:46 PM, Luiz Geovani Vier lgv...@gmail.com wrote:

 Hello,

 Is there a built-in way or connector to store DStream results into an
 existing Hive ORC table using the Hive/HCatalog Streaming API?
 Otherwise, do you have any suggestions regarding the implementation of
 such component?

 Thank you,
 -Geovani



Re: Store DStreams into Hive using Hive Streaming

2014-11-06 Thread Silvio Fiorito
Geovani,

You can use HiveContext to do inserts into a Hive table in a Streaming app just 
as you would a batch app. A DStream is really a collection of RDDs so you can 
run the insert from within the foreachRDD. You just have to be careful that 
you’re not creating large amounts of small files. So you may want to either 
increase the duration of your Streaming batches or repartition right before you 
insert. You’ll just need to do some testing based on your ingest volume. You 
may also want to consider streaming into another data store though.

Thanks,
Silvio

From: Luiz Geovani Vier lgv...@gmail.commailto:lgv...@gmail.com
Date: Thursday, November 6, 2014 at 7:46 PM
To: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Store DStreams into Hive using Hive Streaming

Hello,

Is there a built-in way or connector to store DStream results into an existing 
Hive ORC table using the Hive/HCatalog Streaming API?
Otherwise, do you have any suggestions regarding the implementation of such 
component?

Thank you,
-Geovani