Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

David Morales Mon, 15 Dec 2014 00:52:09 -0800

Hi there,

For sure, the new release does support SparkSQL, so you can use sparkSQL
and Stratio Deep together jusy out of the box.


About cross-data, it' not itself related to Spark but can use Spark-Deep.
It's an interactive SQL like Hive, for example.


Regards.

2014-12-12 21:29 GMT+01:00 Niranda Perera <nira...@wso2.com>:
>
> Hi David,
>
> I have been going through the Deep-Spark examples. It looks very
> promising.
>
> On a follow up query, does Deep-spark/ deep-cassandra support SQL like
> operations on RDDs (like SparkSQL)?
>
> Example (from Datastax Cassandra connector demos):
>
> object SQLDemo extends DemoApp {
>
>   val cc = new CassandraSQLContext(sc)
>
>   CassandraConnector(conf).withSessionDo { session =>
>     session.execute("CREATE KEYSPACE IF NOT EXISTS test WITH REPLICATION =
> {'class': 'SimpleStrategy', 'replication_factor': 1 }")
>     session.execute("DROP TABLE IF EXISTS test.sql_demo")
>     session.execute("CREATE TABLE test.sql_demo (key INT PRIMARY KEY, grp
> INT, value DOUBLE)")
>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES (1,
> 1, 1.0)")
>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES (2,
> 1, 2.5)")
>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES (3,
> 1, 10.0)")
>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES (4,
> 2, 4.0)")
>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES (5,
> 2, 2.2)")
>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES (6,
> 2, 2.8)")
>   }
>
>   val rdd = cc.cassandraSql("SELECT grp, max(value) AS mv FROM
> test.sql_demo GROUP BY grp ORDER BY mv")
>   rdd.collect().foreach(println)  // [2, 4.0] [1, 10.0]
>
>   sc.stop()
> }
>
> I also read about Stratio Crossdata. Does Crossdata serve this purpose?
>
> Rgds
>
> On Tue, Dec 2, 2014 at 11:14 PM, David Morales <dmora...@stratio.com>
> wrote:
>>
>> Hi¡
>>
>> Please, check the develop branch if you want to see a more realistic view
>> of our development path. Last commit was about two hours ago :)
>>
>> Stratio Deep is one of our core modules so there is a core team in
>> Stratio fully devoted to spark + noSQL integration. In these last months,
>> for example, we have added mongoDB, ElasticSearch and Aerospike to Stratio
>> Deep, so you can talk to these databases from Spark just like you do with
>> HDFS.
>>
>> Furthermore, we are working on more backends, such as neo4j or couchBase,
>> for example.
>>
>>
>> About our benchmarks, you can check out some results in this link:
>> http://www.stratio.com/deep-vs-datastax/
>>
>> Please, keep in mind that spark integration with a datastore could be
>> done in two ways: HCI or native. We are now working on improving native
>> integration because it's quite more performant. In this way, we are just
>> working on some other tests with even more impressive results.
>>
>>
>> Here you can find a technical overview of all our platform.
>>
>>
>> http://www.slideshare.net/Stratio/stratio-platform-overview-v41
>>
>> Regards
>>
>> 2014-12-02 11:14 GMT+01:00 Niranda Perera <nira...@wso2.com>:
>>
>>> Hi David,
>>>
>>> Sorry to re-initiate this thread. But may I know if you have done any
>>> benchmarking on Datastax Spark cassandra connector and Stratio Deep-spark
>>> cassandra integration? Would love to take a look at it.
>>>
>>> I recently checked deep-spark github repo and noticed that there is no
>>> activity since Oct 29th. May I know what your future plans on this
>>> particular project?
>>>
>>> Cheers
>>>
>>> On Tue, Aug 26, 2014 at 9:12 PM, David Morales <dmora...@stratio.com>
>>> wrote:
>>>
>>>> Yes, it is already included in our benchmarks.
>>>>
>>>> It could be a nice idea to share our findings, let me talk about it
>>>> here. Meanwhile, you can ask us any question by using my mail or this
>>>> thread, we are glad to help you.
>>>>
>>>>
>>>> Best regards.
>>>>
>>>>
>>>> 2014-08-24 15:49 GMT+02:00 Niranda Perera <nira...@wso2.com>:
>>>>
>>>>> Hi David,
>>>>>
>>>>> Thank you for your detailed reply.
>>>>>
>>>>> It was great to hear about Stratio-Deep and I must say, it looks very
>>>>> interesting. Storage handlers for databases such Cassandra, MongoDB etc
>>>>> would be very helpful. We will definitely look up on Stratio-Deep.
>>>>>
>>>>> I came across with the Datastax Spark-Cassandra connector (
>>>>> https://github.com/datastax/spark-cassandra-connector ). Have you
>>>>> done any comparison with your implementation and Datastax's connector?
>>>>>
>>>>> And, yes, please do share the performance results with us once it's
>>>>> ready.
>>>>>
>>>>> On a different note, is there any way for us to interact with Stratio
>>>>> dev community, in the form of dev mail lists etc, so that we could 
>>>>> mutually
>>>>> share our findings?
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 22, 2014 at 2:07 PM, David Morales <dmora...@stratio.com>
>>>>> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> *1. About the size of deployments.*
>>>>>>
>>>>>> It depends on your use case... specially when you combine spark with
>>>>>> a datastore. We use to deploy spark with cassandra or mongodb, instead of
>>>>>> using HDFS for example.
>>>>>>
>>>>>> Spark will be faster if you put the data in memory, so if you need a
>>>>>> lot of speed (interactive queries, for example), you should have enough
>>>>>> memory.
>>>>>>
>>>>>>
>>>>>> *2. About storage handlers.*
>>>>>>
>>>>>> We have developed the first tight integration between Cassandra and
>>>>>> Spark, called Stratio Deep, announced in the first spark summit. You can
>>>>>> check Stratio Deep out here: https://github.com/Stratio/stratio-deep 
>>>>>> (open,
>>>>>> apache2 license).
>>>>>>
>>>>>> *Deep is a thin integration layer between Apache Spark and several
>>>>>> NoSQL datastores. We actually support Apache Cassandra and MongoDB, but 
>>>>>> in
>>>>>> the near future we will add support for sever other datastores.*
>>>>>>
>>>>>> Datastax have announce its own driver for spark in the last spark
>>>>>> summit, but we have been working in our solution for almost a year.
>>>>>>
>>>>>> Furthermore, we are working to extend this solution in order to
>>>>>> work also with other databases... MongoDB integration is completed right
>>>>>> now and ElasticSearch will be ready in a few weeks.
>>>>>>
>>>>>> And that is not all, we have also developed an integration with
>>>>>> Cassandra and Lucene for indexing data (open source, apache2).
>>>>>>
>>>>>> *Stratio Cassandra is a fork of Apache Cassandra
>>>>>> <http://cassandra.apache.org/> where index functionality has been 
>>>>>> extended
>>>>>> to provide near real time search such as ElasticSearch or Solr,
>>>>>> including full text search
>>>>>> <http://en.wikipedia.org/wiki/Full_text_search> capabilities and free
>>>>>> multivariable search. It is achieved through an Apache Lucene
>>>>>> <http://lucene.apache.org/> based implementation of Cassandra secondary
>>>>>> indexes, where each node of the cluster indexes its own data.*
>>>>>>
>>>>>>
>>>>>> We will publish some benchmarks in two weeks, so i will share our
>>>>>> results here if you are interested.
>>>>>>
>>>>>>
>>>>>> If you are more interested in distributed file systems, you should
>>>>>> take a look on Tachyon: http://tachyon-project.org/index.html
>>>>>>
>>>>>>
>>>>>> *3. Spark - Hive compatibility*
>>>>>>
>>>>>> Spark will support anything with the Hadoop InputFormat interface.
>>>>>>
>>>>>>
>>>>>> *4. Performance*
>>>>>>
>>>>>> We are working a lot with Cassandra and mongoDB and the performance
>>>>>> is quite nice. We are finishing right now some benchmarks comparing 
>>>>>> Hadoop
>>>>>> + HDFS vs Spark + HDFS vs Spark + Cassandra (using stratio deep and even
>>>>>> our fork of Cassandra).
>>>>>>
>>>>>> Let me please share this results with you when they were ready, ok?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-08-22 7:53 GMT+02:00 Niranda Perera <nira...@wso2.com>:
>>>>>>
>>>>>> Hi Srinath,
>>>>>>> Yes, I am working on deploying it on a multi-node cluster with the
>>>>>>> debs dataset. I will keep architecture@ posted on the progress.
>>>>>>>
>>>>>>>
>>>>>>> Hi David,
>>>>>>> Thank you very much for the detailed insight you've provided.
>>>>>>> Few quick questions,
>>>>>>> 1. Do you have experiences in using storage handlers in Spark?
>>>>>>> 2. Would a storage handler used in Hive, be directly compatible with
>>>>>>> Spark?
>>>>>>> 3. How do you grade the performance of Spark with other databases
>>>>>>> such as Cassandra, HBase, H2, etc?
>>>>>>>
>>>>>>> Thank you very much again for your interest. Look forward to hearing
>>>>>>> from you.
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 21, 2014 at 7:02 PM, Srinath Perera <srin...@wso2.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Niranda, we need test Spark in multi-node mode before making a
>>>>>>>> decision. Spark is very fast, I think there is no doubt about that. We 
>>>>>>>> need
>>>>>>>> to make sure it stable.
>>>>>>>>
>>>>>>>> David, thanks for a detailed email! How big (nodes) is the Spark
>>>>>>>> setup you guys are running?
>>>>>>>>
>>>>>>>> --Srinath
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 21, 2014 at 1:34 PM, David Morales <
>>>>>>>> dmora...@stratio.com> wrote:
>>>>>>>>
>>>>>>>>> Sorry for disturbing this thread, but i think that i can help
>>>>>>>>> clarifying a few things (we were attending the last Spark Summit, we 
>>>>>>>>> were
>>>>>>>>> also speakers there and we are working very close to spark)
>>>>>>>>>
>>>>>>>>> *> Hive/Shark and others benchmark*
>>>>>>>>>
>>>>>>>>> You can find a nice comparison and benchmark in this web:
>>>>>>>>> https://amplab.cs.berkeley.edu/benchmark/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *> Shark and SparkSQL*
>>>>>>>>>
>>>>>>>>> SparkSQL is the natural replacement for Shark, but SparkSQL is
>>>>>>>>> still young at this moment. If you are looking for Hive 
>>>>>>>>> compatibility, you
>>>>>>>>> have to execute SparkSQL with an specific context.
>>>>>>>>>
>>>>>>>>> Quoted from spark website:
>>>>>>>>>
>>>>>>>>> *> Note that Spark SQL currently uses a very basic SQL
>>>>>>>>> parser. Users that want a more complete dialect of SQL should look at 
>>>>>>>>> the
>>>>>>>>> HiveSQL support provided by HiveContext.*
>>>>>>>>>
>>>>>>>>> So, only note that SparkSQL is a work in progress. If you want
>>>>>>>>> SparkSQL you have to run a SparkSQLContext, if you want Hive, you 
>>>>>>>>> will have
>>>>>>>>> a different context...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *> Spark - Hadoop: the future*
>>>>>>>>>
>>>>>>>>> Most Hadoop distributions are including Spark: cloudera,
>>>>>>>>> hortonworks, mapR... and contributing to migrate all the Hadoop 
>>>>>>>>> ecosystem
>>>>>>>>> to Spark.
>>>>>>>>>
>>>>>>>>> Spark is a bit more than Map/Reduce... as you can read here:
>>>>>>>>> http://gigaom.com/2014/06/28/4-reasons-why-spark-could-jolt-hadoop-into-hyperdrive/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *> Spark Streaming / Spark SQL*
>>>>>>>>>
>>>>>>>>> Spark Streaming is built on Spark and it provides streaming
>>>>>>>>> processing through an information abstraction called DStreams (a 
>>>>>>>>> collection
>>>>>>>>> of RDDs in a window of time).
>>>>>>>>>
>>>>>>>>> There is some efforts in order to make SparkSQL compatible with
>>>>>>>>> Spark Streaming (something similar to trident for storm), as you can 
>>>>>>>>> see
>>>>>>>>> here:
>>>>>>>>>
>>>>>>>>> *StreamSQL (https://github.com/thunderain-project/StreamSQL
>>>>>>>>> <https://github.com/thunderain-project/StreamSQL>) is a POC project 
>>>>>>>>> based
>>>>>>>>> on Spark to combine the power of Catalyst and Spark Streaming, to 
>>>>>>>>> offer
>>>>>>>>> people the ability to manipulate SQL on top of DStream as you wanted, 
>>>>>>>>> this
>>>>>>>>> keep the same semantics with SparkSQL as offer a SchemaDStream on top 
>>>>>>>>> of
>>>>>>>>> DStream. You don't need to do tricky thing like extracting rdd to 
>>>>>>>>> register
>>>>>>>>> as a table. Besides other parts are the same as Spark.*
>>>>>>>>>
>>>>>>>>> So, you can apply a SQL in a data stream, but it is very simple at
>>>>>>>>> the moment... you can expect a bunch of improvements in this matter 
>>>>>>>>> in the
>>>>>>>>> next months (i guess that sparkSQL will work on Spark streaming 
>>>>>>>>> streams
>>>>>>>>> before the end of this year).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *> Spark Streaming / Spark SQL and CEP*
>>>>>>>>>
>>>>>>>>> There is no relationship at this moment between (your absolutely
>>>>>>>>> amazing) Siddhi CEP and Spark. As fas as i know, you are working in 
>>>>>>>>> doing
>>>>>>>>> distributed CEP with Storm and Siddhi.
>>>>>>>>>
>>>>>>>>> We are currently working on doing an interactive cep built with
>>>>>>>>> kafka + spark streaming + siddhi, with some features such as an API, 
>>>>>>>>> an
>>>>>>>>> interactive shell, built-in statistics and auditing, built-in 
>>>>>>>>> functions
>>>>>>>>> (save2cassandra, save2mongo, save2elasticsearch...).
>>>>>>>>>
>>>>>>>>> If you are interested we can talk about this project, i think that
>>>>>>>>> it would be a nice idea¡
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Anyway, i don't think that SparkSQL will evolve in something like
>>>>>>>>> a CEP. Patterns, sequences, for example would be very complex to do 
>>>>>>>>> with
>>>>>>>>> spark streaming (at least now).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-08-21 6:18 GMT+02:00 Sriskandarajah Suhothayan <s...@wso2.com
>>>>>>>>> >:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 20, 2014 at 1:36 PM, Niranda Perera <nira...@wso2.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> @Maninda,
>>>>>>>>>>>
>>>>>>>>>>> +1 for suggesting Spark SQL.
>>>>>>>>>>>
>>>>>>>>>>> Quote Databricks,
>>>>>>>>>>> "Spark SQL provides state-of-the-art SQL performance and
>>>>>>>>>>> maintains compatibility with Shark/Hive. In particular, like Shark, 
>>>>>>>>>>> Spark
>>>>>>>>>>> SQL supports all existing Hive data formats, user-defined functions 
>>>>>>>>>>> (UDF),
>>>>>>>>>>> and the Hive metastore." [1]
>>>>>>>>>>>
>>>>>>>>>>> But I am not entirely sure if Spark SQL and Siddhi is
>>>>>>>>>>> comparable, because SparkSQL (like Hive) is designed for batch 
>>>>>>>>>>> processing,
>>>>>>>>>>> where as Siddhi is real-time processing. But if there are 
>>>>>>>>>>> implementations
>>>>>>>>>>> where Siddhi is run on top of Spark, it would be very interesting.
>>>>>>>>>>>
>>>>>>>>>> Yes Siddhi's current way of operation does not support this. But
>>>>>>>>>> with partitions and we can achieve this to some extent.
>>>>>>>>>>
>>>>>>>>>> Suho
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Spark supports either Hadoop1 or 2. But I think we should see,
>>>>>>>>>>> what is best, MR1 or YARN+MR2
>>>>>>>>>>>
>>>>>>>>>>> [image: Hadoop Architecture]
>>>>>>>>>>> [2]
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html
>>>>>>>>>>> [2] http://www.tomsitpro.com/articles/hadoop-2-vs-1,2-718.html
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Aug 20, 2014 at 1:13 PM, Lasantha Fernando <
>>>>>>>>>>> lasan...@wso2.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Maninda,
>>>>>>>>>>>>
>>>>>>>>>>>> On 20 August 2014 12:02, Maninda Edirisooriya <mani...@wso2.com
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> In the case of discontinuity of Shark project, IMO we should
>>>>>>>>>>>>> not move to Shark at all.
>>>>>>>>>>>>> And it seems better to go with Spark SQL as we are already
>>>>>>>>>>>>> using Spark for CEP. But I am not sure the difference between 
>>>>>>>>>>>>> Spark SQL and
>>>>>>>>>>>>> the Siddhi queries on the Spark engine.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Currently, we are doing integration with CEP using Apache
>>>>>>>>>>>> Storm, not Spark... :-). Spark Streaming is a possible candidate 
>>>>>>>>>>>> for
>>>>>>>>>>>> integrating with CEP, but we have opted with Storm. I think there 
>>>>>>>>>>>> has been
>>>>>>>>>>>> some independent work on integrating Kafka + Spark Streaming + 
>>>>>>>>>>>> Siddhi.
>>>>>>>>>>>> Please refer to thread on arch@ "[Architecture] A few
>>>>>>>>>>>> questions about WSO2 CEP/Siddhi"
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> And we have to figure out how Spark SQL is used for historical
>>>>>>>>>>>>> data, whether it can execute incremental processing by default 
>>>>>>>>>>>>> which will
>>>>>>>>>>>>> implement all out existing BAM use cases.
>>>>>>>>>>>>> On the other hand in Hadoop 2 [1] they are using a completely
>>>>>>>>>>>>> different platform for resource allocation known as Yarn. 
>>>>>>>>>>>>> Sometimes this
>>>>>>>>>>>>> may be more suitable for batch jobs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://www.youtube.com/watch?v=RncoVN0l6dc
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Lasantha
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Maninda Edirisooriya*
>>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>>>
>>>>>>>>>>>>> *WSO2, Inc. *lean.enterprise.middleware.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>>>>>>>>> *E-mail* : mani...@wso2.com
>>>>>>>>>>>>> *Skype* : @manindae
>>>>>>>>>>>>> *Twitter* : @maninda
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Aug 20, 2014 at 11:33 AM, Niranda Perera <
>>>>>>>>>>>>> nira...@wso2.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Anjana and Srinath,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After the discussion I had with Anjana, I researched more on
>>>>>>>>>>>>>> the continuation of Shark project by Databricks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's what I found out,
>>>>>>>>>>>>>> - Shark was built on the Hive codebase and achieved
>>>>>>>>>>>>>> performance improvements by swapping out the physical execution 
>>>>>>>>>>>>>> engine part
>>>>>>>>>>>>>> of Hive. While this approach enabled Shark users to speed up 
>>>>>>>>>>>>>> their Hive
>>>>>>>>>>>>>> queries, Shark inherited a large, complicated code base from 
>>>>>>>>>>>>>> Hive that made
>>>>>>>>>>>>>> it hard to optimize and maintain.
>>>>>>>>>>>>>> Hence, Databricks has announced that they are halting the
>>>>>>>>>>>>>> development of Shark from July, 2014. (Shark 0.9 would be the 
>>>>>>>>>>>>>> last release)
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> - Shark will be replaced by Spark SQL. It beats Shark in TPC-DS
>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>> <http://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html>
>>>>>>>>>>>>>> by almost an order of magnitude. It also supports all existing 
>>>>>>>>>>>>>> Hive data
>>>>>>>>>>>>>> formats, user-defined functions (UDF), and the Hive metastore.  
>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>> - Following is the Shark, Spark SQL migration plan
>>>>>>>>>>>>>> http://spark-summit.org/wp-content/uploads/2014/07/Future-of-Spark-Patrick-Wendell.pdf
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - For the legacy Hive and MapReduce users, they have proposed
>>>>>>>>>>>>>> a new 'Hive on Spark Project' [3], [4]
>>>>>>>>>>>>>> But, given the performance enhancement, it is quite certain
>>>>>>>>>>>>>> that Hive and MR would be replaced by engines build on top of 
>>>>>>>>>>>>>> Spark (ex:
>>>>>>>>>>>>>> Spark SQL)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In my opinion there are a few matters to figure out if we are
>>>>>>>>>>>>>> migrating from Hive,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. whether we are changing the query engine only? (Then, we
>>>>>>>>>>>>>> can replace Hive by Shark)
>>>>>>>>>>>>>> 2. whether we are changing the existing Hadoop/ MapReduce
>>>>>>>>>>>>>> framework to Spark? (Then we can replace Hive and Hadoop with 
>>>>>>>>>>>>>> Spark and
>>>>>>>>>>>>>> Spark SQL)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In my opinion, considering the longterm impact and the
>>>>>>>>>>>>>> availability of support, it is best to migrate the Hive/Hadoop 
>>>>>>>>>>>>>> to Spark.
>>>>>>>>>>>>>> It is open for discussion!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the mean time, I've already tried Spark SQL, and
>>>>>>>>>>>>>> Databricks claims on improved performance seems to be true. I 
>>>>>>>>>>>>>> will work
>>>>>>>>>>>>>> more on this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html
>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>> http://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html
>>>>>>>>>>>>>> [3] https://issues.apache.org/jira/browse/HIVE-7292
>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 12:16 PM, Anjana Fernando <
>>>>>>>>>>>>>> anj...@wso2.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Srinath,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No, this has not been tested in multiple nodes. I told
>>>>>>>>>>>>>>> Niranda here in my last mail, to test a cluster with the same 
>>>>>>>>>>>>>>> set of
>>>>>>>>>>>>>>> hardware we have, that we are using to test our large data set 
>>>>>>>>>>>>>>> with Hive.
>>>>>>>>>>>>>>> As for the effort to make the change, we still have to figure 
>>>>>>>>>>>>>>> out the MT
>>>>>>>>>>>>>>> aspects of Shark here. Sinthuja was working on making the 
>>>>>>>>>>>>>>> latest Hive
>>>>>>>>>>>>>>> version MT ready, and most probably, we can do the same changes 
>>>>>>>>>>>>>>> to the Hive
>>>>>>>>>>>>>>> version Shark is using. So after we do that, the integration 
>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>> seamless. And also, as I mentioned earlier here, we are also 
>>>>>>>>>>>>>>> going to test
>>>>>>>>>>>>>>> this with the APIM Hive script, to check if there are any 
>>>>>>>>>>>>>>> unforeseen
>>>>>>>>>>>>>>> incompatibilities.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Anjana.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 11:53 AM, Srinath Perera <
>>>>>>>>>>>>>>> srin...@wso2.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This look great.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We need to test Spark with multiple nodes? Did we do that.
>>>>>>>>>>>>>>>> Please create few VMs in performance could (talk to Lakmal) 
>>>>>>>>>>>>>>>> and test with
>>>>>>>>>>>>>>>> at least 5 nodes. We need to make sure it works OK with 
>>>>>>>>>>>>>>>> distributed setup
>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What does it take to change to spark? Anjana .. how much
>>>>>>>>>>>>>>>> work is it?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --Srinath
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Aug 13, 2014 at 7:06 PM, Niranda Perera <
>>>>>>>>>>>>>>>> nira...@wso2.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you Anjana.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, I am working on it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In the mean time, I found this in Hive documentation [1].
>>>>>>>>>>>>>>>>> It talks about Hive on Spark, and compares Hive, Shark and 
>>>>>>>>>>>>>>>>> Spark SQL at an
>>>>>>>>>>>>>>>>> higher architectural level.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Additionally, it is said that the in-memory performance of
>>>>>>>>>>>>>>>>> Shark can be improved by introducing Tachyon [2]. I guess we 
>>>>>>>>>>>>>>>>> can consider
>>>>>>>>>>>>>>>>> this later on.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark#HiveonSpark-1.3ComparisonwithSharkandSparkSQL
>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>> http://tachyon-project.org/Running-Tachyon-Locally.html
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Aug 13, 2014 at 3:17 PM, Anjana Fernando <
>>>>>>>>>>>>>>>>> anj...@wso2.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Hi Niranda,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Excellent analysis of Hive vs Shark! .. This gives a lot
>>>>>>>>>>>>>>>>>> of insight into how both operates in different scenarios. As 
>>>>>>>>>>>>>>>>>> the next step,
>>>>>>>>>>>>>>>>>> we will need to run this in an actual cluster of computers. 
>>>>>>>>>>>>>>>>>> Since you've
>>>>>>>>>>>>>>>>>> used a subset of the dataset of 2014 DEBS challenge, we 
>>>>>>>>>>>>>>>>>> should use the full
>>>>>>>>>>>>>>>>>> data set in a clustered environment and check this. Gokul is 
>>>>>>>>>>>>>>>>>> already
>>>>>>>>>>>>>>>>>> working on the Hive based setup for this, after that is 
>>>>>>>>>>>>>>>>>> done, you can
>>>>>>>>>>>>>>>>>> create a Shark cluster in the same hardware and run the 
>>>>>>>>>>>>>>>>>> tests there, to get
>>>>>>>>>>>>>>>>>> a clear comparison on how these two match up in a cluster. 
>>>>>>>>>>>>>>>>>> Until the setup
>>>>>>>>>>>>>>>>>> is ready, do continue with your next steps on checking the 
>>>>>>>>>>>>>>>>>> RDD support and
>>>>>>>>>>>>>>>>>> Spark SQL use.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> After these are done, we should also do a trial run of
>>>>>>>>>>>>>>>>>> our own APIM Hive scripts, migrated to Shark.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>> Anjana.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  On Mon, Aug 11, 2014 at 12:21 PM, Niranda Perera <
>>>>>>>>>>>>>>>>>> nira...@wso2.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have been evaluating the performance of
>>>>>>>>>>>>>>>>>>> Shark (distributed SQL query engine for Hadoop) against 
>>>>>>>>>>>>>>>>>>> Hive. This is with
>>>>>>>>>>>>>>>>>>> the objective of seeing the possibility to move the WSO2 
>>>>>>>>>>>>>>>>>>> BAM data
>>>>>>>>>>>>>>>>>>> processing (which currently uses Hive) to Shark (and Apache 
>>>>>>>>>>>>>>>>>>> Spark) for
>>>>>>>>>>>>>>>>>>> improved performance.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I am sharing my findings herewith.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  *AMP Lab Shark*
>>>>>>>>>>>>>>>>>>> Shark can execute Hive QL queries up to 100 times faster
>>>>>>>>>>>>>>>>>>> than Hive without any modification to the existing data or 
>>>>>>>>>>>>>>>>>>> queries. It
>>>>>>>>>>>>>>>>>>> supports Hive's QL, metastore, serialization formats, and 
>>>>>>>>>>>>>>>>>>> user-defined
>>>>>>>>>>>>>>>>>>> functions, providing seamless integration with existing 
>>>>>>>>>>>>>>>>>>> Hive deployments
>>>>>>>>>>>>>>>>>>> and a familiar, more powerful option for new ones. [1]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *Apache Spark*Apache Spark is an open-source data
>>>>>>>>>>>>>>>>>>> analytics cluster computing framework. It fits into the 
>>>>>>>>>>>>>>>>>>> Hadoop open-source
>>>>>>>>>>>>>>>>>>> community, building on top of the HDFS and promises 
>>>>>>>>>>>>>>>>>>> performance up to 100
>>>>>>>>>>>>>>>>>>> times faster than Hadoop MapReduce for certain 
>>>>>>>>>>>>>>>>>>> applications. [2]
>>>>>>>>>>>>>>>>>>> Official documentation: [3]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I carried out the comparison between the following Hive
>>>>>>>>>>>>>>>>>>> and Shark releases with input files ranging from 100 to 1 
>>>>>>>>>>>>>>>>>>> billion entries.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> QL Engine
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Apache Hive 0.11
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Shark Shark 0.9.1 (Latest release) which uses,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    Scala 2.10.3
>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    Spark 0.9.1
>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    AMPLab’s Hive 0.9.0
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Framework
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hadoop 1.0.4
>>>>>>>>>>>>>>>>>>> Spark 0.9.1
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> File system
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> HDFS
>>>>>>>>>>>>>>>>>>> HDFS
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Attached herewith is a report which describes in detail
>>>>>>>>>>>>>>>>>>> about the performance comparison between Shark and Hive.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>  hive_vs_shark
>>>>>>>>>>>>>>>>>>> <https://docs.google.com/a/wso2.com/folderview?id=0B1GsnfycTl32QTZqUktKck1Ucjg&usp=drive_web>
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>  hive_vs_shark_report.odt
>>>>>>>>>>>>>>>>>>> <https://docs.google.com/a/wso2.com/file/d/0B1GsnfycTl32X3J5dTh6Slloa0E/edit?usp=drive_web>
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In summary,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> From the evaluation, following conclusions can be
>>>>>>>>>>>>>>>>>>> derived.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    - Shark is indifferent to Hive in DDL operations
>>>>>>>>>>>>>>>>>>>    (CREATE, DROP .. TABLE, DATABASE). Both engines show a 
>>>>>>>>>>>>>>>>>>> fairly constant
>>>>>>>>>>>>>>>>>>>    performance as the input size increases.
>>>>>>>>>>>>>>>>>>>    - Shark is indifferent to Hive in DML operations
>>>>>>>>>>>>>>>>>>>    (LOAD, INSERT) but when a DML operation is called in 
>>>>>>>>>>>>>>>>>>> conjuncture of a data
>>>>>>>>>>>>>>>>>>>    retrieval operation (ex. INSERT <TBL> SELECT <PROP> FROM 
>>>>>>>>>>>>>>>>>>> <TBL>), Shark
>>>>>>>>>>>>>>>>>>>    significantly over-performs Hive with a performance 
>>>>>>>>>>>>>>>>>>> factor of 10x+ (Ranging
>>>>>>>>>>>>>>>>>>>    from 10x to 80x in some instances). Shark performance 
>>>>>>>>>>>>>>>>>>> factor reduces with
>>>>>>>>>>>>>>>>>>>    the input size increases, while HIVE performance is 
>>>>>>>>>>>>>>>>>>> fairly indifferent.
>>>>>>>>>>>>>>>>>>>    - Shark clearly over-performs Hive in Data Retrieval
>>>>>>>>>>>>>>>>>>>    operations (FILTER, ORDER BY, JOIN). Hive performance is 
>>>>>>>>>>>>>>>>>>> fairly indifferent
>>>>>>>>>>>>>>>>>>>    in the data retrieval operations while Shark performance 
>>>>>>>>>>>>>>>>>>> reduces as the
>>>>>>>>>>>>>>>>>>>    input size increases. But at every instance Shark 
>>>>>>>>>>>>>>>>>>> over-performed Hive with
>>>>>>>>>>>>>>>>>>>    a minimum performance factor of 5x+ (Ranging from 5x to 
>>>>>>>>>>>>>>>>>>> 80x in some
>>>>>>>>>>>>>>>>>>>    instances).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please refer the 'hive_vs_shark_report', it has all the
>>>>>>>>>>>>>>>>>>> information about the queries and timings pictographically.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The code repository can also be found in
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://github.com/nirandaperera/hiveToShark/tree/master/hiveVsShark
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Moving forward, I am currently working on the following.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    - Apache Spark's resilient distributed dataset (RDD)
>>>>>>>>>>>>>>>>>>>    abstraction (which is a collection of elements 
>>>>>>>>>>>>>>>>>>> partitioned across the nodes
>>>>>>>>>>>>>>>>>>>    of the cluster that can be operated on in parallel). The 
>>>>>>>>>>>>>>>>>>> use of RDDs and
>>>>>>>>>>>>>>>>>>>    its impact to the performance.
>>>>>>>>>>>>>>>>>>>    - Spark SQL - Use of this Spark SQL over Shark on
>>>>>>>>>>>>>>>>>>>    Spark framework
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [1] https://github.com/amplab/shark/wiki
>>>>>>>>>>>>>>>>>>> [2] http://en.wikipedia.org/wiki/Apache_Spark
>>>>>>>>>>>>>>>>>>> [3] http://spark.apache.org/docs/latest/
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Would love to have your feedback on this.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>  *Niranda Perera*
>>>>>>>>>>>>>>>>>>> Software Engineer, WSO2 Inc.
>>>>>>>>>>>>>>>>>>> Mobile: +94-71-554-8430
>>>>>>>>>>>>>>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> *Anjana Fernando*
>>>>>>>>>>>>>>>>>> Senior Technical Lead
>>>>>>>>>>>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>>>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> *Niranda Perera*
>>>>>>>>>>>>>>>>> Software Engineer, WSO2 Inc.
>>>>>>>>>>>>>>>>> Mobile: +94-71-554-8430
>>>>>>>>>>>>>>>>>  Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> ============================
>>>>>>>>>>>>>>>> Srinath Perera, Ph.D.
>>>>>>>>>>>>>>>>    http://people.apache.org/~hemapani/
>>>>>>>>>>>>>>>>    http://srinathsview.blogspot.com/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> *Anjana Fernando*
>>>>>>>>>>>>>>> Senior Technical Lead
>>>>>>>>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Niranda Perera*
>>>>>>>>>>>>>> Software Engineer, WSO2 Inc.
>>>>>>>>>>>>>> Mobile: +94-71-554-8430
>>>>>>>>>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Lasantha Fernando*
>>>>>>>>>>>> Software Engineer - Data Technologies Team
>>>>>>>>>>>> WSO2 Inc. http://wso2.com
>>>>>>>>>>>>
>>>>>>>>>>>> email: lasan...@wso2.com
>>>>>>>>>>>> mobile: (+94) 71 5247551
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Niranda Perera*
>>>>>>>>>>> Software Engineer, WSO2 Inc.
>>>>>>>>>>> Mobile: +94-71-554-8430
>>>>>>>>>>>  Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> *S. Suhothayan*
>>>>>>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>>>>>>>>>>  *WSO2 Inc. *http://wso2.com
>>>>>>>>>> * <http://wso2.com/>*
>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>>>>>>>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> 
>>>>>>>>>> twitter:
>>>>>>>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | 
>>>>>>>>>> linked-in:
>>>>>>>>>> http://lk.linkedin.com/in/suhothayan 
>>>>>>>>>> <http://lk.linkedin.com/in/suhothayan>*
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Architecture mailing list
>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Architecture mailing list
>>>>>>>>> Architecture@wso2.org
>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ============================
>>>>>>>> Srinath Perera, Ph.D.
>>>>>>>>    http://people.apache.org/~hemapani/
>>>>>>>>    http://srinathsview.blogspot.com/
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Architecture mailing list
>>>>>>>> Architecture@wso2.org
>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Niranda Perera*
>>>>>>> Software Engineer, WSO2 Inc.
>>>>>>> Mobile: +94-71-554-8430
>>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Niranda Perera*
>>>>> Software Engineer, WSO2 Inc.
>>>>> Mobile: +94-71-554-8430
>>>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Niranda Perera*
>>> Software Engineer, WSO2 Inc.
>>> Mobile: +94-71-554-8430
>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>
>>
>>
>>
>> --
>>
>> David Morales de Frías  ::  +34 607 010 411 :: @dmoralesdf
>> <https://twitter.com/dmoralesdf>
>>
>>
>> <http://www.stratio.com/>
>> Avenida de Europa, 26. Ática 5. 2ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>>
>
>
> --
> *Niranda Perera*
> Software Engineer, WSO2 Inc.
> Mobile: +94-71-554-8430
> Twitter: @n1r44 <https://twitter.com/N1R44>
>


-- 

David Morales de Frías  ::  +34 607 010 411 :: @dmoralesdf
<https://twitter.com/dmoralesdf>


<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 2ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

Reply via email to