Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

Niranda Perera Mon, 15 Dec 2014 01:16:11 -0800

Hi David,

Could you point me to an example where SparkSQL is used in Stratio Deep?


Rgds

On Mon, Dec 15, 2014 at 2:20 PM, David Morales <dmora...@stratio.com> wrote:
>
> Hi there,
>
> For sure, the new release does support SparkSQL, so you can use sparkSQL
> and Stratio Deep together jusy out of the box.
>
> About cross-data, it' not itself related to Spark but can use Spark-Deep.
> It's an interactive SQL like Hive, for example.
>
>
> Regards.
>
> 2014-12-12 21:29 GMT+01:00 Niranda Perera <nira...@wso2.com>:
>>
>> Hi David,
>>
>> I have been going through the Deep-Spark examples. It looks very
>> promising.
>>
>> On a follow up query, does Deep-spark/ deep-cassandra support SQL like
>> operations on RDDs (like SparkSQL)?
>>
>> Example (from Datastax Cassandra connector demos):
>>
>> object SQLDemo extends DemoApp {
>>
>>   val cc = new CassandraSQLContext(sc)
>>
>>   CassandraConnector(conf).withSessionDo { session =>
>>     session.execute("CREATE KEYSPACE IF NOT EXISTS test WITH REPLICATION
>> = {'class': 'SimpleStrategy', 'replication_factor': 1 }")
>>     session.execute("DROP TABLE IF EXISTS test.sql_demo")
>>     session.execute("CREATE TABLE test.sql_demo (key INT PRIMARY KEY, grp
>> INT, value DOUBLE)")
>>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES
>> (1, 1, 1.0)")
>>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES
>> (2, 1, 2.5)")
>>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES
>> (3, 1, 10.0)")
>>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES
>> (4, 2, 4.0)")
>>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES
>> (5, 2, 2.2)")
>>     session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES
>> (6, 2, 2.8)")
>>   }
>>
>>   val rdd = cc.cassandraSql("SELECT grp, max(value) AS mv FROM
>> test.sql_demo GROUP BY grp ORDER BY mv")
>>   rdd.collect().foreach(println)  // [2, 4.0] [1, 10.0]
>>
>>   sc.stop()
>> }
>>
>> I also read about Stratio Crossdata. Does Crossdata serve this purpose?
>>
>> Rgds
>>
>> On Tue, Dec 2, 2014 at 11:14 PM, David Morales <dmora...@stratio.com>
>> wrote:
>>>
>>> Hi¡
>>>
>>> Please, check the develop branch if you want to see a more realistic
>>> view of our development path. Last commit was about two hours ago :)
>>>
>>> Stratio Deep is one of our core modules so there is a core team in
>>> Stratio fully devoted to spark + noSQL integration. In these last months,
>>> for example, we have added mongoDB, ElasticSearch and Aerospike to Stratio
>>> Deep, so you can talk to these databases from Spark just like you do with
>>> HDFS.
>>>
>>> Furthermore, we are working on more backends, such as neo4j or
>>> couchBase, for example.
>>>
>>>
>>> About our benchmarks, you can check out some results in this link:
>>> http://www.stratio.com/deep-vs-datastax/
>>>
>>> Please, keep in mind that spark integration with a datastore could be
>>> done in two ways: HCI or native. We are now working on improving native
>>> integration because it's quite more performant. In this way, we are just
>>> working on some other tests with even more impressive results.
>>>
>>>
>>> Here you can find a technical overview of all our platform.
>>>
>>>
>>> http://www.slideshare.net/Stratio/stratio-platform-overview-v41
>>>
>>> Regards
>>>
>>> 2014-12-02 11:14 GMT+01:00 Niranda Perera <nira...@wso2.com>:
>>>
>>>> Hi David,
>>>>
>>>> Sorry to re-initiate this thread. But may I know if you have done any
>>>> benchmarking on Datastax Spark cassandra connector and Stratio Deep-spark
>>>> cassandra integration? Would love to take a look at it.
>>>>
>>>> I recently checked deep-spark github repo and noticed that there is no
>>>> activity since Oct 29th. May I know what your future plans on this
>>>> particular project?
>>>>
>>>> Cheers
>>>>
>>>> On Tue, Aug 26, 2014 at 9:12 PM, David Morales <dmora...@stratio.com>
>>>> wrote:
>>>>
>>>>> Yes, it is already included in our benchmarks.
>>>>>
>>>>> It could be a nice idea to share our findings, let me talk about it
>>>>> here. Meanwhile, you can ask us any question by using my mail or this
>>>>> thread, we are glad to help you.
>>>>>
>>>>>
>>>>> Best regards.
>>>>>
>>>>>
>>>>> 2014-08-24 15:49 GMT+02:00 Niranda Perera <nira...@wso2.com>:
>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> Thank you for your detailed reply.
>>>>>>
>>>>>> It was great to hear about Stratio-Deep and I must say, it looks very
>>>>>> interesting. Storage handlers for databases such Cassandra, MongoDB etc
>>>>>> would be very helpful. We will definitely look up on Stratio-Deep.
>>>>>>
>>>>>> I came across with the Datastax Spark-Cassandra connector (
>>>>>> https://github.com/datastax/spark-cassandra-connector ). Have you
>>>>>> done any comparison with your implementation and Datastax's connector?
>>>>>>
>>>>>> And, yes, please do share the performance results with us once it's
>>>>>> ready.
>>>>>>
>>>>>> On a different note, is there any way for us to interact with Stratio
>>>>>> dev community, in the form of dev mail lists etc, so that we could 
>>>>>> mutually
>>>>>> share our findings?
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 22, 2014 at 2:07 PM, David Morales <dmora...@stratio.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi there,
>>>>>>>
>>>>>>> *1. About the size of deployments.*
>>>>>>>
>>>>>>> It depends on your use case... specially when you combine spark with
>>>>>>> a datastore. We use to deploy spark with cassandra or mongodb, instead 
>>>>>>> of
>>>>>>> using HDFS for example.
>>>>>>>
>>>>>>> Spark will be faster if you put the data in memory, so if you need a
>>>>>>> lot of speed (interactive queries, for example), you should have enough
>>>>>>> memory.
>>>>>>>
>>>>>>>
>>>>>>> *2. About storage handlers.*
>>>>>>>
>>>>>>> We have developed the first tight integration between Cassandra and
>>>>>>> Spark, called Stratio Deep, announced in the first spark summit. You can
>>>>>>> check Stratio Deep out here: https://github.com/Stratio/stratio-deep 
>>>>>>> (open,
>>>>>>> apache2 license).
>>>>>>>
>>>>>>> *Deep is a thin integration layer between Apache Spark and several
>>>>>>> NoSQL datastores. We actually support Apache Cassandra and MongoDB, but 
>>>>>>> in
>>>>>>> the near future we will add support for sever other datastores.*
>>>>>>>
>>>>>>> Datastax have announce its own driver for spark in the last spark
>>>>>>> summit, but we have been working in our solution for almost a year.
>>>>>>>
>>>>>>> Furthermore, we are working to extend this solution in order to
>>>>>>> work also with other databases... MongoDB integration is completed right
>>>>>>> now and ElasticSearch will be ready in a few weeks.
>>>>>>>
>>>>>>> And that is not all, we have also developed an integration with
>>>>>>> Cassandra and Lucene for indexing data (open source, apache2).
>>>>>>>
>>>>>>> *Stratio Cassandra is a fork of Apache Cassandra
>>>>>>> <http://cassandra.apache.org/> where index functionality has been 
>>>>>>> extended
>>>>>>> to provide near real time search such as ElasticSearch or Solr,
>>>>>>> including full text search
>>>>>>> <http://en.wikipedia.org/wiki/Full_text_search> capabilities and free
>>>>>>> multivariable search. It is achieved through an Apache Lucene
>>>>>>> <http://lucene.apache.org/> based implementation of Cassandra secondary
>>>>>>> indexes, where each node of the cluster indexes its own data.*
>>>>>>>
>>>>>>>
>>>>>>> We will publish some benchmarks in two weeks, so i will share our
>>>>>>> results here if you are interested.
>>>>>>>
>>>>>>>
>>>>>>> If you are more interested in distributed file systems, you should
>>>>>>> take a look on Tachyon: http://tachyon-project.org/index.html
>>>>>>>
>>>>>>>
>>>>>>> *3. Spark - Hive compatibility*
>>>>>>>
>>>>>>> Spark will support anything with the Hadoop InputFormat interface.
>>>>>>>
>>>>>>>
>>>>>>> *4. Performance*
>>>>>>>
>>>>>>> We are working a lot with Cassandra and mongoDB and the performance
>>>>>>> is quite nice. We are finishing right now some benchmarks comparing 
>>>>>>> Hadoop
>>>>>>> + HDFS vs Spark + HDFS vs Spark + Cassandra (using stratio deep and even
>>>>>>> our fork of Cassandra).
>>>>>>>
>>>>>>> Let me please share this results with you when they were ready, ok?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-08-22 7:53 GMT+02:00 Niranda Perera <nira...@wso2.com>:
>>>>>>>
>>>>>>> Hi Srinath,
>>>>>>>> Yes, I am working on deploying it on a multi-node cluster with the
>>>>>>>> debs dataset. I will keep architecture@ posted on the progress.
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi David,
>>>>>>>> Thank you very much for the detailed insight you've provided.
>>>>>>>> Few quick questions,
>>>>>>>> 1. Do you have experiences in using storage handlers in Spark?
>>>>>>>> 2. Would a storage handler used in Hive, be directly compatible
>>>>>>>> with Spark?
>>>>>>>> 3. How do you grade the performance of Spark with other databases
>>>>>>>> such as Cassandra, HBase, H2, etc?
>>>>>>>>
>>>>>>>> Thank you very much again for your interest. Look forward to
>>>>>>>> hearing from you.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 21, 2014 at 7:02 PM, Srinath Perera <srin...@wso2.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Niranda, we need test Spark in multi-node mode before making a
>>>>>>>>> decision. Spark is very fast, I think there is no doubt about that. 
>>>>>>>>> We need
>>>>>>>>> to make sure it stable.
>>>>>>>>>
>>>>>>>>> David, thanks for a detailed email! How big (nodes) is the Spark
>>>>>>>>> setup you guys are running?
>>>>>>>>>
>>>>>>>>> --Srinath
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 21, 2014 at 1:34 PM, David Morales <
>>>>>>>>> dmora...@stratio.com> wrote:
>>>>>>>>>
>>>>>>>>>> Sorry for disturbing this thread, but i think that i can help
>>>>>>>>>> clarifying a few things (we were attending the last Spark Summit, we 
>>>>>>>>>> were
>>>>>>>>>> also speakers there and we are working very close to spark)
>>>>>>>>>>
>>>>>>>>>> *> Hive/Shark and others benchmark*
>>>>>>>>>>
>>>>>>>>>> You can find a nice comparison and benchmark in this web:
>>>>>>>>>> https://amplab.cs.berkeley.edu/benchmark/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *> Shark and SparkSQL*
>>>>>>>>>>
>>>>>>>>>> SparkSQL is the natural replacement for Shark, but SparkSQL is
>>>>>>>>>> still young at this moment. If you are looking for Hive 
>>>>>>>>>> compatibility, you
>>>>>>>>>> have to execute SparkSQL with an specific context.
>>>>>>>>>>
>>>>>>>>>> Quoted from spark website:
>>>>>>>>>>
>>>>>>>>>> *> Note that Spark SQL currently uses a very basic SQL
>>>>>>>>>> parser. Users that want a more complete dialect of SQL should look 
>>>>>>>>>> at the
>>>>>>>>>> HiveSQL support provided by HiveContext.*
>>>>>>>>>>
>>>>>>>>>> So, only note that SparkSQL is a work in progress. If you want
>>>>>>>>>> SparkSQL you have to run a SparkSQLContext, if you want Hive, you 
>>>>>>>>>> will have
>>>>>>>>>> a different context...
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *> Spark - Hadoop: the future*
>>>>>>>>>>
>>>>>>>>>> Most Hadoop distributions are including Spark: cloudera,
>>>>>>>>>> hortonworks, mapR... and contributing to migrate all the Hadoop 
>>>>>>>>>> ecosystem
>>>>>>>>>> to Spark.
>>>>>>>>>>
>>>>>>>>>> Spark is a bit more than Map/Reduce... as you can read here:
>>>>>>>>>> http://gigaom.com/2014/06/28/4-reasons-why-spark-could-jolt-hadoop-into-hyperdrive/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *> Spark Streaming / Spark SQL*
>>>>>>>>>>
>>>>>>>>>> Spark Streaming is built on Spark and it provides streaming
>>>>>>>>>> processing through an information abstraction called DStreams (a 
>>>>>>>>>> collection
>>>>>>>>>> of RDDs in a window of time).
>>>>>>>>>>
>>>>>>>>>> There is some efforts in order to make SparkSQL compatible with
>>>>>>>>>> Spark Streaming (something similar to trident for storm), as you can 
>>>>>>>>>> see
>>>>>>>>>> here:
>>>>>>>>>>
>>>>>>>>>> *StreamSQL (https://github.com/thunderain-project/StreamSQL
>>>>>>>>>> <https://github.com/thunderain-project/StreamSQL>) is a POC project 
>>>>>>>>>> based
>>>>>>>>>> on Spark to combine the power of Catalyst and Spark Streaming, to 
>>>>>>>>>> offer
>>>>>>>>>> people the ability to manipulate SQL on top of DStream as you 
>>>>>>>>>> wanted, this
>>>>>>>>>> keep the same semantics with SparkSQL as offer a SchemaDStream on 
>>>>>>>>>> top of
>>>>>>>>>> DStream. You don't need to do tricky thing like extracting rdd to 
>>>>>>>>>> register
>>>>>>>>>> as a table. Besides other parts are the same as Spark.*
>>>>>>>>>>
>>>>>>>>>> So, you can apply a SQL in a data stream, but it is very simple
>>>>>>>>>> at the moment... you can expect a bunch of improvements in this 
>>>>>>>>>> matter in
>>>>>>>>>> the next months (i guess that sparkSQL will work on Spark streaming 
>>>>>>>>>> streams
>>>>>>>>>> before the end of this year).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *> Spark Streaming / Spark SQL and CEP*
>>>>>>>>>>
>>>>>>>>>> There is no relationship at this moment between (your absolutely
>>>>>>>>>> amazing) Siddhi CEP and Spark. As fas as i know, you are working in 
>>>>>>>>>> doing
>>>>>>>>>> distributed CEP with Storm and Siddhi.
>>>>>>>>>>
>>>>>>>>>> We are currently working on doing an interactive cep built with
>>>>>>>>>> kafka + spark streaming + siddhi, with some features such as an API, 
>>>>>>>>>> an
>>>>>>>>>> interactive shell, built-in statistics and auditing, built-in 
>>>>>>>>>> functions
>>>>>>>>>> (save2cassandra, save2mongo, save2elasticsearch...).
>>>>>>>>>>
>>>>>>>>>> If you are interested we can talk about this project, i think
>>>>>>>>>> that it would be a nice idea¡
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Anyway, i don't think that SparkSQL will evolve in something like
>>>>>>>>>> a CEP. Patterns, sequences, for example would be very complex to do 
>>>>>>>>>> with
>>>>>>>>>> spark streaming (at least now).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-08-21 6:18 GMT+02:00 Sriskandarajah Suhothayan <
>>>>>>>>>> s...@wso2.com>:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Aug 20, 2014 at 1:36 PM, Niranda Perera <
>>>>>>>>>>> nira...@wso2.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> @Maninda,
>>>>>>>>>>>>
>>>>>>>>>>>> +1 for suggesting Spark SQL.
>>>>>>>>>>>>
>>>>>>>>>>>> Quote Databricks,
>>>>>>>>>>>> "Spark SQL provides state-of-the-art SQL performance and
>>>>>>>>>>>> maintains compatibility with Shark/Hive. In particular, like 
>>>>>>>>>>>> Shark, Spark
>>>>>>>>>>>> SQL supports all existing Hive data formats, user-defined 
>>>>>>>>>>>> functions (UDF),
>>>>>>>>>>>> and the Hive metastore." [1]
>>>>>>>>>>>>
>>>>>>>>>>>> But I am not entirely sure if Spark SQL and Siddhi is
>>>>>>>>>>>> comparable, because SparkSQL (like Hive) is designed for batch 
>>>>>>>>>>>> processing,
>>>>>>>>>>>> where as Siddhi is real-time processing. But if there are 
>>>>>>>>>>>> implementations
>>>>>>>>>>>> where Siddhi is run on top of Spark, it would be very interesting.
>>>>>>>>>>>>
>>>>>>>>>>> Yes Siddhi's current way of operation does not support this. But
>>>>>>>>>>> with partitions and we can achieve this to some extent.
>>>>>>>>>>>
>>>>>>>>>>> Suho
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Spark supports either Hadoop1 or 2. But I think we should see,
>>>>>>>>>>>> what is best, MR1 or YARN+MR2
>>>>>>>>>>>>
>>>>>>>>>>>> [image: Hadoop Architecture]
>>>>>>>>>>>> [2]
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html
>>>>>>>>>>>> [2] http://www.tomsitpro.com/articles/hadoop-2-vs-1,2-718.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Aug 20, 2014 at 1:13 PM, Lasantha Fernando <
>>>>>>>>>>>> lasan...@wso2.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Maninda,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 20 August 2014 12:02, Maninda Edirisooriya <
>>>>>>>>>>>>> mani...@wso2.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the case of discontinuity of Shark project, IMO we should
>>>>>>>>>>>>>> not move to Shark at all.
>>>>>>>>>>>>>> And it seems better to go with Spark SQL as we are already
>>>>>>>>>>>>>> using Spark for CEP. But I am not sure the difference between 
>>>>>>>>>>>>>> Spark SQL and
>>>>>>>>>>>>>> the Siddhi queries on the Spark engine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Currently, we are doing integration with CEP using Apache
>>>>>>>>>>>>> Storm, not Spark... :-). Spark Streaming is a possible candidate 
>>>>>>>>>>>>> for
>>>>>>>>>>>>> integrating with CEP, but we have opted with Storm. I think there 
>>>>>>>>>>>>> has been
>>>>>>>>>>>>> some independent work on integrating Kafka + Spark Streaming + 
>>>>>>>>>>>>> Siddhi.
>>>>>>>>>>>>> Please refer to thread on arch@ "[Architecture] A few
>>>>>>>>>>>>> questions about WSO2 CEP/Siddhi"
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> And we have to figure out how Spark SQL is used for historical
>>>>>>>>>>>>>> data, whether it can execute incremental processing by default 
>>>>>>>>>>>>>> which will
>>>>>>>>>>>>>> implement all out existing BAM use cases.
>>>>>>>>>>>>>> On the other hand in Hadoop 2 [1] they are using a completely
>>>>>>>>>>>>>> different platform for resource allocation known as Yarn. 
>>>>>>>>>>>>>> Sometimes this
>>>>>>>>>>>>>> may be more suitable for batch jobs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] https://www.youtube.com/watch?v=RncoVN0l6dc
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Lasantha
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Maninda Edirisooriya*
>>>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *WSO2, Inc. *lean.enterprise.middleware.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Blog* : http://maninda.blogspot.com/
>>>>>>>>>>>>>> *E-mail* : mani...@wso2.com
>>>>>>>>>>>>>> *Skype* : @manindae
>>>>>>>>>>>>>> *Twitter* : @maninda
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Aug 20, 2014 at 11:33 AM, Niranda Perera <
>>>>>>>>>>>>>> nira...@wso2.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Anjana and Srinath,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> After the discussion I had with Anjana, I researched more on
>>>>>>>>>>>>>>> the continuation of Shark project by Databricks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here's what I found out,
>>>>>>>>>>>>>>> - Shark was built on the Hive codebase and achieved
>>>>>>>>>>>>>>> performance improvements by swapping out the physical execution 
>>>>>>>>>>>>>>> engine part
>>>>>>>>>>>>>>> of Hive. While this approach enabled Shark users to speed up 
>>>>>>>>>>>>>>> their Hive
>>>>>>>>>>>>>>> queries, Shark inherited a large, complicated code base from 
>>>>>>>>>>>>>>> Hive that made
>>>>>>>>>>>>>>> it hard to optimize and maintain.
>>>>>>>>>>>>>>> Hence, Databricks has announced that they are halting the
>>>>>>>>>>>>>>> development of Shark from July, 2014. (Shark 0.9 would be the 
>>>>>>>>>>>>>>> last release)
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> - Shark will be replaced by Spark SQL. It beats Shark in TPC-DS
>>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>>> <http://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html>
>>>>>>>>>>>>>>> by almost an order of magnitude. It also supports all existing 
>>>>>>>>>>>>>>> Hive data
>>>>>>>>>>>>>>> formats, user-defined functions (UDF), and the Hive metastore.  
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>> - Following is the Shark, Spark SQL migration plan
>>>>>>>>>>>>>>> http://spark-summit.org/wp-content/uploads/2014/07/Future-of-Spark-Patrick-Wendell.pdf
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - For the legacy Hive and MapReduce users, they have
>>>>>>>>>>>>>>> proposed a new 'Hive on Spark Project' [3], [4]
>>>>>>>>>>>>>>> But, given the performance enhancement, it is quite certain
>>>>>>>>>>>>>>> that Hive and MR would be replaced by engines build on top of 
>>>>>>>>>>>>>>> Spark (ex:
>>>>>>>>>>>>>>> Spark SQL)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In my opinion there are a few matters to figure out if we
>>>>>>>>>>>>>>> are migrating from Hive,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. whether we are changing the query engine only? (Then, we
>>>>>>>>>>>>>>> can replace Hive by Shark)
>>>>>>>>>>>>>>> 2. whether we are changing the existing Hadoop/ MapReduce
>>>>>>>>>>>>>>> framework to Spark? (Then we can replace Hive and Hadoop with 
>>>>>>>>>>>>>>> Spark and
>>>>>>>>>>>>>>> Spark SQL)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In my opinion, considering the longterm impact and the
>>>>>>>>>>>>>>> availability of support, it is best to migrate the Hive/Hadoop 
>>>>>>>>>>>>>>> to Spark.
>>>>>>>>>>>>>>> It is open for discussion!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the mean time, I've already tried Spark SQL, and
>>>>>>>>>>>>>>> Databricks claims on improved performance seems to be true. I 
>>>>>>>>>>>>>>> will work
>>>>>>>>>>>>>>> more on this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html
>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>> http://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html
>>>>>>>>>>>>>>> [3] https://issues.apache.org/jira/browse/HIVE-7292
>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 12:16 PM, Anjana Fernando <
>>>>>>>>>>>>>>> anj...@wso2.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Srinath,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No, this has not been tested in multiple nodes. I told
>>>>>>>>>>>>>>>> Niranda here in my last mail, to test a cluster with the same 
>>>>>>>>>>>>>>>> set of
>>>>>>>>>>>>>>>> hardware we have, that we are using to test our large data set 
>>>>>>>>>>>>>>>> with Hive.
>>>>>>>>>>>>>>>> As for the effort to make the change, we still have to figure 
>>>>>>>>>>>>>>>> out the MT
>>>>>>>>>>>>>>>> aspects of Shark here. Sinthuja was working on making the 
>>>>>>>>>>>>>>>> latest Hive
>>>>>>>>>>>>>>>> version MT ready, and most probably, we can do the same 
>>>>>>>>>>>>>>>> changes to the Hive
>>>>>>>>>>>>>>>> version Shark is using. So after we do that, the integration 
>>>>>>>>>>>>>>>> should be
>>>>>>>>>>>>>>>> seamless. And also, as I mentioned earlier here, we are also 
>>>>>>>>>>>>>>>> going to test
>>>>>>>>>>>>>>>> this with the APIM Hive script, to check if there are any 
>>>>>>>>>>>>>>>> unforeseen
>>>>>>>>>>>>>>>> incompatibilities.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Anjana.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 11:53 AM, Srinath Perera <
>>>>>>>>>>>>>>>> srin...@wso2.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This look great.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We need to test Spark with multiple nodes? Did we do that.
>>>>>>>>>>>>>>>>> Please create few VMs in performance could (talk to Lakmal) 
>>>>>>>>>>>>>>>>> and test with
>>>>>>>>>>>>>>>>> at least 5 nodes. We need to make sure it works OK with 
>>>>>>>>>>>>>>>>> distributed setup
>>>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What does it take to change to spark? Anjana .. how much
>>>>>>>>>>>>>>>>> work is it?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --Srinath
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Aug 13, 2014 at 7:06 PM, Niranda Perera <
>>>>>>>>>>>>>>>>> nira...@wso2.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you Anjana.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes, I am working on it.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In the mean time, I found this in Hive documentation [1].
>>>>>>>>>>>>>>>>>> It talks about Hive on Spark, and compares Hive, Shark and 
>>>>>>>>>>>>>>>>>> Spark SQL at an
>>>>>>>>>>>>>>>>>> higher architectural level.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Additionally, it is said that the in-memory performance
>>>>>>>>>>>>>>>>>> of Shark can be improved by introducing Tachyon [2]. I guess 
>>>>>>>>>>>>>>>>>> we can
>>>>>>>>>>>>>>>>>> consider this later on.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cheers.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark#HiveonSpark-1.3ComparisonwithSharkandSparkSQL
>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>> http://tachyon-project.org/Running-Tachyon-Locally.html
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Aug 13, 2014 at 3:17 PM, Anjana Fernando <
>>>>>>>>>>>>>>>>>> anj...@wso2.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  Hi Niranda,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Excellent analysis of Hive vs Shark! .. This gives a lot
>>>>>>>>>>>>>>>>>>> of insight into how both operates in different scenarios. 
>>>>>>>>>>>>>>>>>>> As the next step,
>>>>>>>>>>>>>>>>>>> we will need to run this in an actual cluster of computers. 
>>>>>>>>>>>>>>>>>>> Since you've
>>>>>>>>>>>>>>>>>>> used a subset of the dataset of 2014 DEBS challenge, we 
>>>>>>>>>>>>>>>>>>> should use the full
>>>>>>>>>>>>>>>>>>> data set in a clustered environment and check this. Gokul 
>>>>>>>>>>>>>>>>>>> is already
>>>>>>>>>>>>>>>>>>> working on the Hive based setup for this, after that is 
>>>>>>>>>>>>>>>>>>> done, you can
>>>>>>>>>>>>>>>>>>> create a Shark cluster in the same hardware and run the 
>>>>>>>>>>>>>>>>>>> tests there, to get
>>>>>>>>>>>>>>>>>>> a clear comparison on how these two match up in a cluster. 
>>>>>>>>>>>>>>>>>>> Until the setup
>>>>>>>>>>>>>>>>>>> is ready, do continue with your next steps on checking the 
>>>>>>>>>>>>>>>>>>> RDD support and
>>>>>>>>>>>>>>>>>>> Spark SQL use.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> After these are done, we should also do a trial run of
>>>>>>>>>>>>>>>>>>> our own APIM Hive scripts, migrated to Shark.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>> Anjana.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  On Mon, Aug 11, 2014 at 12:21 PM, Niranda Perera <
>>>>>>>>>>>>>>>>>>> nira...@wso2.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I have been evaluating the performance of
>>>>>>>>>>>>>>>>>>>> Shark (distributed SQL query engine for Hadoop) against 
>>>>>>>>>>>>>>>>>>>> Hive. This is with
>>>>>>>>>>>>>>>>>>>> the objective of seeing the possibility to move the WSO2 
>>>>>>>>>>>>>>>>>>>> BAM data
>>>>>>>>>>>>>>>>>>>> processing (which currently uses Hive) to Shark (and 
>>>>>>>>>>>>>>>>>>>> Apache Spark) for
>>>>>>>>>>>>>>>>>>>> improved performance.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I am sharing my findings herewith.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  *AMP Lab Shark*
>>>>>>>>>>>>>>>>>>>> Shark can execute Hive QL queries up to 100 times
>>>>>>>>>>>>>>>>>>>> faster than Hive without any modification to the existing 
>>>>>>>>>>>>>>>>>>>> data or queries.
>>>>>>>>>>>>>>>>>>>> It supports Hive's QL, metastore, serialization formats, 
>>>>>>>>>>>>>>>>>>>> and user-defined
>>>>>>>>>>>>>>>>>>>> functions, providing seamless integration with existing 
>>>>>>>>>>>>>>>>>>>> Hive deployments
>>>>>>>>>>>>>>>>>>>> and a familiar, more powerful option for new ones. [1]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *Apache Spark*Apache Spark is an open-source data
>>>>>>>>>>>>>>>>>>>> analytics cluster computing framework. It fits into the 
>>>>>>>>>>>>>>>>>>>> Hadoop open-source
>>>>>>>>>>>>>>>>>>>> community, building on top of the HDFS and promises 
>>>>>>>>>>>>>>>>>>>> performance up to 100
>>>>>>>>>>>>>>>>>>>> times faster than Hadoop MapReduce for certain 
>>>>>>>>>>>>>>>>>>>> applications. [2]
>>>>>>>>>>>>>>>>>>>> Official documentation: [3]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I carried out the comparison between the following Hive
>>>>>>>>>>>>>>>>>>>> and Shark releases with input files ranging from 100 to 1 
>>>>>>>>>>>>>>>>>>>> billion entries.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> QL Engine
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Apache Hive 0.11
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Shark Shark 0.9.1 (Latest release) which uses,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>    Scala 2.10.3
>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>    Spark 0.9.1
>>>>>>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>    AMPLab’s Hive 0.9.0
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Framework
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hadoop 1.0.4
>>>>>>>>>>>>>>>>>>>> Spark 0.9.1
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> File system
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> HDFS
>>>>>>>>>>>>>>>>>>>> HDFS
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Attached herewith is a report which describes in detail
>>>>>>>>>>>>>>>>>>>> about the performance comparison between Shark and Hive.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>  hive_vs_shark
>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/a/wso2.com/folderview?id=0B1GsnfycTl32QTZqUktKck1Ucjg&usp=drive_web>
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>  hive_vs_shark_report.odt
>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/a/wso2.com/file/d/0B1GsnfycTl32X3J5dTh6Slloa0E/edit?usp=drive_web>
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In summary,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> From the evaluation, following conclusions can be
>>>>>>>>>>>>>>>>>>>> derived.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>    - Shark is indifferent to Hive in DDL operations
>>>>>>>>>>>>>>>>>>>>    (CREATE, DROP .. TABLE, DATABASE). Both engines show a 
>>>>>>>>>>>>>>>>>>>> fairly constant
>>>>>>>>>>>>>>>>>>>>    performance as the input size increases.
>>>>>>>>>>>>>>>>>>>>    - Shark is indifferent to Hive in DML operations
>>>>>>>>>>>>>>>>>>>>    (LOAD, INSERT) but when a DML operation is called in 
>>>>>>>>>>>>>>>>>>>> conjuncture of a data
>>>>>>>>>>>>>>>>>>>>    retrieval operation (ex. INSERT <TBL> SELECT <PROP> 
>>>>>>>>>>>>>>>>>>>> FROM <TBL>), Shark
>>>>>>>>>>>>>>>>>>>>    significantly over-performs Hive with a performance 
>>>>>>>>>>>>>>>>>>>> factor of 10x+ (Ranging
>>>>>>>>>>>>>>>>>>>>    from 10x to 80x in some instances). Shark performance 
>>>>>>>>>>>>>>>>>>>> factor reduces with
>>>>>>>>>>>>>>>>>>>>    the input size increases, while HIVE performance is 
>>>>>>>>>>>>>>>>>>>> fairly indifferent.
>>>>>>>>>>>>>>>>>>>>    - Shark clearly over-performs Hive in Data
>>>>>>>>>>>>>>>>>>>>    Retrieval operations (FILTER, ORDER BY, JOIN). Hive 
>>>>>>>>>>>>>>>>>>>> performance is fairly
>>>>>>>>>>>>>>>>>>>>    indifferent in the data retrieval operations while 
>>>>>>>>>>>>>>>>>>>> Shark performance
>>>>>>>>>>>>>>>>>>>>    reduces as the input size increases. But at every 
>>>>>>>>>>>>>>>>>>>> instance Shark
>>>>>>>>>>>>>>>>>>>>    over-performed Hive with a minimum performance factor 
>>>>>>>>>>>>>>>>>>>> of 5x+ (Ranging from
>>>>>>>>>>>>>>>>>>>>    5x to 80x in some instances).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Please refer the 'hive_vs_shark_report', it has all the
>>>>>>>>>>>>>>>>>>>> information about the queries and timings pictographically.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The code repository can also be found in
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> https://github.com/nirandaperera/hiveToShark/tree/master/hiveVsShark
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Moving forward, I am currently working on the
>>>>>>>>>>>>>>>>>>>> following.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>    - Apache Spark's resilient distributed dataset
>>>>>>>>>>>>>>>>>>>>    (RDD) abstraction (which is a collection of elements 
>>>>>>>>>>>>>>>>>>>> partitioned across the
>>>>>>>>>>>>>>>>>>>>    nodes of the cluster that can be operated on in 
>>>>>>>>>>>>>>>>>>>> parallel). The use of RDDs
>>>>>>>>>>>>>>>>>>>>    and its impact to the performance.
>>>>>>>>>>>>>>>>>>>>    - Spark SQL - Use of this Spark SQL over Shark on
>>>>>>>>>>>>>>>>>>>>    Spark framework
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [1] https://github.com/amplab/shark/wiki
>>>>>>>>>>>>>>>>>>>> [2] http://en.wikipedia.org/wiki/Apache_Spark
>>>>>>>>>>>>>>>>>>>> [3] http://spark.apache.org/docs/latest/
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Would love to have your feedback on this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>  *Niranda Perera*
>>>>>>>>>>>>>>>>>>>> Software Engineer, WSO2 Inc.
>>>>>>>>>>>>>>>>>>>> Mobile: +94-71-554-8430
>>>>>>>>>>>>>>>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> *Anjana Fernando*
>>>>>>>>>>>>>>>>>>> Senior Technical Lead
>>>>>>>>>>>>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>>>>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> *Niranda Perera*
>>>>>>>>>>>>>>>>>> Software Engineer, WSO2 Inc.
>>>>>>>>>>>>>>>>>> Mobile: +94-71-554-8430
>>>>>>>>>>>>>>>>>>  Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> ============================
>>>>>>>>>>>>>>>>> Srinath Perera, Ph.D.
>>>>>>>>>>>>>>>>>    http://people.apache.org/~hemapani/
>>>>>>>>>>>>>>>>>    http://srinathsview.blogspot.com/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> *Anjana Fernando*
>>>>>>>>>>>>>>>> Senior Technical Lead
>>>>>>>>>>>>>>>> WSO2 Inc. | http://wso2.com
>>>>>>>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> *Niranda Perera*
>>>>>>>>>>>>>>> Software Engineer, WSO2 Inc.
>>>>>>>>>>>>>>> Mobile: +94-71-554-8430
>>>>>>>>>>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> *Lasantha Fernando*
>>>>>>>>>>>>> Software Engineer - Data Technologies Team
>>>>>>>>>>>>> WSO2 Inc. http://wso2.com
>>>>>>>>>>>>>
>>>>>>>>>>>>> email: lasan...@wso2.com
>>>>>>>>>>>>> mobile: (+94) 71 5247551
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> *Niranda Perera*
>>>>>>>>>>>> Software Engineer, WSO2 Inc.
>>>>>>>>>>>> Mobile: +94-71-554-8430
>>>>>>>>>>>>  Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> *S. Suhothayan*
>>>>>>>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>>>>>>>>>>>  *WSO2 Inc. *http://wso2.com
>>>>>>>>>>> * <http://wso2.com/>*
>>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>>>>>>>>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> 
>>>>>>>>>>> twitter:
>>>>>>>>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | 
>>>>>>>>>>> linked-in:
>>>>>>>>>>> http://lk.linkedin.com/in/suhothayan 
>>>>>>>>>>> <http://lk.linkedin.com/in/suhothayan>*
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Architecture mailing list
>>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Architecture mailing list
>>>>>>>>>> Architecture@wso2.org
>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ============================
>>>>>>>>> Srinath Perera, Ph.D.
>>>>>>>>>    http://people.apache.org/~hemapani/
>>>>>>>>>    http://srinathsview.blogspot.com/
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Architecture mailing list
>>>>>>>>> Architecture@wso2.org
>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Niranda Perera*
>>>>>>>> Software Engineer, WSO2 Inc.
>>>>>>>> Mobile: +94-71-554-8430
>>>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Niranda Perera*
>>>>>> Software Engineer, WSO2 Inc.
>>>>>> Mobile: +94-71-554-8430
>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Niranda Perera*
>>>> Software Engineer, WSO2 Inc.
>>>> Mobile: +94-71-554-8430
>>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> David Morales de Frías  ::  +34 607 010 411 :: @dmoralesdf
>>> <https://twitter.com/dmoralesdf>
>>>
>>>
>>> <http://www.stratio.com/>
>>> Avenida de Europa, 26. Ática 5. 2ª Planta
>>> 28224 Pozuelo de Alarcón, Madrid
>>> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>>>
>>
>>
>> --
>> *Niranda Perera*
>> Software Engineer, WSO2 Inc.
>> Mobile: +94-71-554-8430
>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>
>
>
> --
>
> David Morales de Frías  ::  +34 607 010 411 :: @dmoralesdf
> <https://twitter.com/dmoralesdf>
>
>
> <http://www.stratio.com/>
> Avenida de Europa, 26. Ática 5. 2ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>


-- 
*Niranda Perera*
Software Engineer, WSO2 Inc.
Mobile: +94-71-554-8430
Twitter: @n1r44 <https://twitter.com/N1R44>

_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

Reply via email to