Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-20 Thread Yu Wei
This is startup project. We don't know how much data will be written everyday. Definitely, there is not too much data at the beginning. But data will increase later. And we want to use spark streaming to receive data via MQTT Util. We're now evaluate which components could be used for storing

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-20 Thread Ted Yu
You can decide which component(s) to use for storing your data. If you haven't used hbase before, it may be better to store data on hdfs and query through Hive or SparkSQL. Maintaining hbase is not trivial task, especially when the cluster size is large. How much data are you expecting to be

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-20 Thread Yu Wei
I'm beginner to big data. I don't have too much knowledge about hbase/hive. What's the difference between hbase and hive/hdfs for storing data for analytics? Thanks, Jared From: ayan guha Sent: Wednesday, July 20, 2016 9:34:24 PM To:

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-20 Thread ayan guha
Just as a rain check, saving data to hbase for analytics may not be the best choice. Any specific reason for not using hdfs or hive? On 20 Jul 2016 20:57, "Rabin Banerjee" wrote: > Hi Wei , > > You can do something like this , > > foreachPartition( (part) => {

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-20 Thread Rabin Banerjee
Hi Wei , You can do something like this , foreachPartition( (part) => {val conn = ConnectionFactory.createConnection(HBaseConfiguration.create()); val table = conn.getTable(TableName.valueOf(tablename)); //part.foreach((inp)=>{println(inp);table.put(inp)}) //This is line by line put

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-20 Thread Yu Wei
I need to write all data received from MQTT data into hbase for further processing. They're not final result. I also need to read the data from hbase for analysis. Is it good choice to use DAO in such situation? Thx, Jared From: Deepak Sharma

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-20 Thread Yu Wei
Hi Ted, I also noticed HBASE-13992. I never used stuff similar as DAO. As a general rule, which is better choice when working with spark, hbase? hbase-spark module, DAO or hbase client api? I'm beginner to big data. Any guidance is very helpful for me. Thanks, Jared

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-19 Thread Deepak Sharma
I am using DAO in spark application to write the final computation to Cassandra and it performs well. What kinds of issues you foresee using DAO for hbase ? Thanks Deepak On 19 Jul 2016 10:04 pm, "Yu Wei" wrote: > Hi guys, > > > I write spark application and want to store

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-19 Thread Ted Yu
hbase-spark module is in the up-coming hbase 2.0 release. Currently it is in master branch of hbase git repo. FYI On Tue, Jul 19, 2016 at 8:27 PM, Andrew Ehrlich wrote: > There is a Spark<->HBase library that does this. I used it once in a > prototype (never tried in

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-19 Thread Andrew Ehrlich
There is a Spark<->HBase library that does this. I used it once in a prototype (never tried in production through): http://blog.cloudera.com/blog/2015/08/apache-spark-comes-to-apache-hbase-with-hbase-spark-module/

Is it good choice to use DAO to store results generated by spark application?

2016-07-19 Thread Yu Wei
Hi guys, I write spark application and want to store results generated by spark application to hbase. Do I need to access hbase via java api directly? Or is it better choice to use DAO similar as traditional RDBMS? I suspect that there is major performance downgrade and other negative