Of course, here you have a simple example:
//reading from mongoDB val config : ExtractorConfig[Cells] = new ExtractorConfig(); config.setExtractorImplClass(classOf[MongoNativeCellExtractor]); config.putValue(ExtractorConstants.DATABASE, "test") config.putValue(ExtractorConstants.COLLECTION, "input") config.putValue(ExtractorConstants.HOST, "localhost:27017") // registering schema val schema = deepContext.createJavaSchemaRDD(config); schema.registerTempTable("input"); // doing the query val messagesFiltered = deepContext.sql("SELECT * FROM input WHERE number > 10 "); // collecting results List<Row> rows = messagesFiltered.collect(); 2014-12-15 10:14 GMT+01:00 Niranda Perera <nira...@wso2.com>: > Hi David, > > Could you point me to an example where SparkSQL is used in Stratio Deep? > > Rgds > > On Mon, Dec 15, 2014 at 2:20 PM, David Morales <dmora...@stratio.com> > wrote: >> >> Hi there, >> >> For sure, the new release does support SparkSQL, so you can use sparkSQL >> and Stratio Deep together jusy out of the box. >> >> About cross-data, it' not itself related to Spark but can use Spark-Deep. >> It's an interactive SQL like Hive, for example. >> >> >> Regards. >> >> 2014-12-12 21:29 GMT+01:00 Niranda Perera <nira...@wso2.com>: >>> >>> Hi David, >>> >>> I have been going through the Deep-Spark examples. It looks very >>> promising. >>> >>> On a follow up query, does Deep-spark/ deep-cassandra support SQL like >>> operations on RDDs (like SparkSQL)? >>> >>> Example (from Datastax Cassandra connector demos): >>> >>> object SQLDemo extends DemoApp { >>> >>> val cc = new CassandraSQLContext(sc) >>> >>> CassandraConnector(conf).withSessionDo { session => >>> session.execute("CREATE KEYSPACE IF NOT EXISTS test WITH REPLICATION >>> = {'class': 'SimpleStrategy', 'replication_factor': 1 }") >>> session.execute("DROP TABLE IF EXISTS test.sql_demo") >>> session.execute("CREATE TABLE test.sql_demo (key INT PRIMARY KEY, >>> grp INT, value DOUBLE)") >>> session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES >>> (1, 1, 1.0)") >>> session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES >>> (2, 1, 2.5)") >>> session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES >>> (3, 1, 10.0)") >>> session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES >>> (4, 2, 4.0)") >>> session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES >>> (5, 2, 2.2)") >>> session.execute("INSERT INTO test.sql_demo(key, grp, value) VALUES >>> (6, 2, 2.8)") >>> } >>> >>> val rdd = cc.cassandraSql("SELECT grp, max(value) AS mv FROM >>> test.sql_demo GROUP BY grp ORDER BY mv") >>> rdd.collect().foreach(println) // [2, 4.0] [1, 10.0] >>> >>> sc.stop() >>> } >>> >>> I also read about Stratio Crossdata. Does Crossdata serve this purpose? >>> >>> Rgds >>> >>> On Tue, Dec 2, 2014 at 11:14 PM, David Morales <dmora...@stratio.com> >>> wrote: >>>> >>>> Hi¡ >>>> >>>> Please, check the develop branch if you want to see a more realistic >>>> view of our development path. Last commit was about two hours ago :) >>>> >>>> Stratio Deep is one of our core modules so there is a core team in >>>> Stratio fully devoted to spark + noSQL integration. In these last months, >>>> for example, we have added mongoDB, ElasticSearch and Aerospike to Stratio >>>> Deep, so you can talk to these databases from Spark just like you do with >>>> HDFS. >>>> >>>> Furthermore, we are working on more backends, such as neo4j or >>>> couchBase, for example. >>>> >>>> >>>> About our benchmarks, you can check out some results in this link: >>>> http://www.stratio.com/deep-vs-datastax/ >>>> >>>> Please, keep in mind that spark integration with a datastore could be >>>> done in two ways: HCI or native. We are now working on improving native >>>> integration because it's quite more performant. In this way, we are just >>>> working on some other tests with even more impressive results. >>>> >>>> >>>> Here you can find a technical overview of all our platform. >>>> >>>> >>>> http://www.slideshare.net/Stratio/stratio-platform-overview-v41 >>>> >>>> Regards >>>> >>>> 2014-12-02 11:14 GMT+01:00 Niranda Perera <nira...@wso2.com>: >>>> >>>>> Hi David, >>>>> >>>>> Sorry to re-initiate this thread. But may I know if you have done any >>>>> benchmarking on Datastax Spark cassandra connector and Stratio Deep-spark >>>>> cassandra integration? Would love to take a look at it. >>>>> >>>>> I recently checked deep-spark github repo and noticed that there is no >>>>> activity since Oct 29th. May I know what your future plans on this >>>>> particular project? >>>>> >>>>> Cheers >>>>> >>>>> On Tue, Aug 26, 2014 at 9:12 PM, David Morales <dmora...@stratio.com> >>>>> wrote: >>>>> >>>>>> Yes, it is already included in our benchmarks. >>>>>> >>>>>> It could be a nice idea to share our findings, let me talk about it >>>>>> here. Meanwhile, you can ask us any question by using my mail or this >>>>>> thread, we are glad to help you. >>>>>> >>>>>> >>>>>> Best regards. >>>>>> >>>>>> >>>>>> 2014-08-24 15:49 GMT+02:00 Niranda Perera <nira...@wso2.com>: >>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>> Thank you for your detailed reply. >>>>>>> >>>>>>> It was great to hear about Stratio-Deep and I must say, it looks >>>>>>> very interesting. Storage handlers for databases such Cassandra, MongoDB >>>>>>> etc would be very helpful. We will definitely look up on Stratio-Deep. >>>>>>> >>>>>>> I came across with the Datastax Spark-Cassandra connector ( >>>>>>> https://github.com/datastax/spark-cassandra-connector ). Have you >>>>>>> done any comparison with your implementation and Datastax's connector? >>>>>>> >>>>>>> And, yes, please do share the performance results with us once it's >>>>>>> ready. >>>>>>> >>>>>>> On a different note, is there any way for us to interact with >>>>>>> Stratio dev community, in the form of dev mail lists etc, so that we >>>>>>> could >>>>>>> mutually share our findings? >>>>>>> >>>>>>> Best regards >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Aug 22, 2014 at 2:07 PM, David Morales <dmora...@stratio.com >>>>>>> > wrote: >>>>>>> >>>>>>>> Hi there, >>>>>>>> >>>>>>>> *1. About the size of deployments.* >>>>>>>> >>>>>>>> It depends on your use case... specially when you combine spark >>>>>>>> with a datastore. We use to deploy spark with cassandra or mongodb, >>>>>>>> instead >>>>>>>> of using HDFS for example. >>>>>>>> >>>>>>>> Spark will be faster if you put the data in memory, so if you need >>>>>>>> a lot of speed (interactive queries, for example), you should have >>>>>>>> enough >>>>>>>> memory. >>>>>>>> >>>>>>>> >>>>>>>> *2. About storage handlers.* >>>>>>>> >>>>>>>> We have developed the first tight integration between Cassandra and >>>>>>>> Spark, called Stratio Deep, announced in the first spark summit. You >>>>>>>> can >>>>>>>> check Stratio Deep out here: >>>>>>>> https://github.com/Stratio/stratio-deep (open, apache2 license). >>>>>>>> >>>>>>>> *Deep is a thin integration layer between Apache Spark and several >>>>>>>> NoSQL datastores. We actually support Apache Cassandra and MongoDB, >>>>>>>> but in >>>>>>>> the near future we will add support for sever other datastores.* >>>>>>>> >>>>>>>> Datastax have announce its own driver for spark in the last spark >>>>>>>> summit, but we have been working in our solution for almost a year. >>>>>>>> >>>>>>>> Furthermore, we are working to extend this solution in order to >>>>>>>> work also with other databases... MongoDB integration is completed >>>>>>>> right >>>>>>>> now and ElasticSearch will be ready in a few weeks. >>>>>>>> >>>>>>>> And that is not all, we have also developed an integration with >>>>>>>> Cassandra and Lucene for indexing data (open source, apache2). >>>>>>>> >>>>>>>> *Stratio Cassandra is a fork of Apache Cassandra >>>>>>>> <http://cassandra.apache.org/> where index functionality has been >>>>>>>> extended >>>>>>>> to provide near real time search such as ElasticSearch or Solr, >>>>>>>> including full text search >>>>>>>> <http://en.wikipedia.org/wiki/Full_text_search> capabilities and free >>>>>>>> multivariable search. It is achieved through an Apache Lucene >>>>>>>> <http://lucene.apache.org/> based implementation of Cassandra secondary >>>>>>>> indexes, where each node of the cluster indexes its own data.* >>>>>>>> >>>>>>>> >>>>>>>> We will publish some benchmarks in two weeks, so i will share our >>>>>>>> results here if you are interested. >>>>>>>> >>>>>>>> >>>>>>>> If you are more interested in distributed file systems, you should >>>>>>>> take a look on Tachyon: http://tachyon-project.org/index.html >>>>>>>> >>>>>>>> >>>>>>>> *3. Spark - Hive compatibility* >>>>>>>> >>>>>>>> Spark will support anything with the Hadoop InputFormat interface. >>>>>>>> >>>>>>>> >>>>>>>> *4. Performance* >>>>>>>> >>>>>>>> We are working a lot with Cassandra and mongoDB and the performance >>>>>>>> is quite nice. We are finishing right now some benchmarks comparing >>>>>>>> Hadoop >>>>>>>> + HDFS vs Spark + HDFS vs Spark + Cassandra (using stratio deep and >>>>>>>> even >>>>>>>> our fork of Cassandra). >>>>>>>> >>>>>>>> Let me please share this results with you when they were ready, ok? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Regards. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2014-08-22 7:53 GMT+02:00 Niranda Perera <nira...@wso2.com>: >>>>>>>> >>>>>>>> Hi Srinath, >>>>>>>>> Yes, I am working on deploying it on a multi-node cluster with the >>>>>>>>> debs dataset. I will keep architecture@ posted on the progress. >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi David, >>>>>>>>> Thank you very much for the detailed insight you've provided. >>>>>>>>> Few quick questions, >>>>>>>>> 1. Do you have experiences in using storage handlers in Spark? >>>>>>>>> 2. Would a storage handler used in Hive, be directly compatible >>>>>>>>> with Spark? >>>>>>>>> 3. How do you grade the performance of Spark with other databases >>>>>>>>> such as Cassandra, HBase, H2, etc? >>>>>>>>> >>>>>>>>> Thank you very much again for your interest. Look forward to >>>>>>>>> hearing from you. >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Aug 21, 2014 at 7:02 PM, Srinath Perera <srin...@wso2.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Niranda, we need test Spark in multi-node mode before making a >>>>>>>>>> decision. Spark is very fast, I think there is no doubt about that. >>>>>>>>>> We need >>>>>>>>>> to make sure it stable. >>>>>>>>>> >>>>>>>>>> David, thanks for a detailed email! How big (nodes) is the Spark >>>>>>>>>> setup you guys are running? >>>>>>>>>> >>>>>>>>>> --Srinath >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Aug 21, 2014 at 1:34 PM, David Morales < >>>>>>>>>> dmora...@stratio.com> wrote: >>>>>>>>>> >>>>>>>>>>> Sorry for disturbing this thread, but i think that i can help >>>>>>>>>>> clarifying a few things (we were attending the last Spark Summit, >>>>>>>>>>> we were >>>>>>>>>>> also speakers there and we are working very close to spark) >>>>>>>>>>> >>>>>>>>>>> *> Hive/Shark and others benchmark* >>>>>>>>>>> >>>>>>>>>>> You can find a nice comparison and benchmark in this web: >>>>>>>>>>> https://amplab.cs.berkeley.edu/benchmark/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *> Shark and SparkSQL* >>>>>>>>>>> >>>>>>>>>>> SparkSQL is the natural replacement for Shark, but SparkSQL is >>>>>>>>>>> still young at this moment. If you are looking for Hive >>>>>>>>>>> compatibility, you >>>>>>>>>>> have to execute SparkSQL with an specific context. >>>>>>>>>>> >>>>>>>>>>> Quoted from spark website: >>>>>>>>>>> >>>>>>>>>>> *> Note that Spark SQL currently uses a very basic SQL >>>>>>>>>>> parser. Users that want a more complete dialect of SQL should look >>>>>>>>>>> at the >>>>>>>>>>> HiveSQL support provided by HiveContext.* >>>>>>>>>>> >>>>>>>>>>> So, only note that SparkSQL is a work in progress. If you want >>>>>>>>>>> SparkSQL you have to run a SparkSQLContext, if you want Hive, you >>>>>>>>>>> will have >>>>>>>>>>> a different context... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *> Spark - Hadoop: the future* >>>>>>>>>>> >>>>>>>>>>> Most Hadoop distributions are including Spark: cloudera, >>>>>>>>>>> hortonworks, mapR... and contributing to migrate all the Hadoop >>>>>>>>>>> ecosystem >>>>>>>>>>> to Spark. >>>>>>>>>>> >>>>>>>>>>> Spark is a bit more than Map/Reduce... as you can read here: >>>>>>>>>>> http://gigaom.com/2014/06/28/4-reasons-why-spark-could-jolt-hadoop-into-hyperdrive/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *> Spark Streaming / Spark SQL* >>>>>>>>>>> >>>>>>>>>>> Spark Streaming is built on Spark and it provides streaming >>>>>>>>>>> processing through an information abstraction called DStreams (a >>>>>>>>>>> collection >>>>>>>>>>> of RDDs in a window of time). >>>>>>>>>>> >>>>>>>>>>> There is some efforts in order to make SparkSQL compatible with >>>>>>>>>>> Spark Streaming (something similar to trident for storm), as you >>>>>>>>>>> can see >>>>>>>>>>> here: >>>>>>>>>>> >>>>>>>>>>> *StreamSQL (https://github.com/thunderain-project/StreamSQL >>>>>>>>>>> <https://github.com/thunderain-project/StreamSQL>) is a POC project >>>>>>>>>>> based >>>>>>>>>>> on Spark to combine the power of Catalyst and Spark Streaming, to >>>>>>>>>>> offer >>>>>>>>>>> people the ability to manipulate SQL on top of DStream as you >>>>>>>>>>> wanted, this >>>>>>>>>>> keep the same semantics with SparkSQL as offer a SchemaDStream on >>>>>>>>>>> top of >>>>>>>>>>> DStream. You don't need to do tricky thing like extracting rdd to >>>>>>>>>>> register >>>>>>>>>>> as a table. Besides other parts are the same as Spark.* >>>>>>>>>>> >>>>>>>>>>> So, you can apply a SQL in a data stream, but it is very simple >>>>>>>>>>> at the moment... you can expect a bunch of improvements in this >>>>>>>>>>> matter in >>>>>>>>>>> the next months (i guess that sparkSQL will work on Spark streaming >>>>>>>>>>> streams >>>>>>>>>>> before the end of this year). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *> Spark Streaming / Spark SQL and CEP* >>>>>>>>>>> >>>>>>>>>>> There is no relationship at this moment between (your absolutely >>>>>>>>>>> amazing) Siddhi CEP and Spark. As fas as i know, you are working in >>>>>>>>>>> doing >>>>>>>>>>> distributed CEP with Storm and Siddhi. >>>>>>>>>>> >>>>>>>>>>> We are currently working on doing an interactive cep built with >>>>>>>>>>> kafka + spark streaming + siddhi, with some features such as an >>>>>>>>>>> API, an >>>>>>>>>>> interactive shell, built-in statistics and auditing, built-in >>>>>>>>>>> functions >>>>>>>>>>> (save2cassandra, save2mongo, save2elasticsearch...). >>>>>>>>>>> >>>>>>>>>>> If you are interested we can talk about this project, i think >>>>>>>>>>> that it would be a nice idea¡ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Anyway, i don't think that SparkSQL will evolve in something >>>>>>>>>>> like a CEP. Patterns, sequences, for example would be very complex >>>>>>>>>>> to do >>>>>>>>>>> with spark streaming (at least now). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2014-08-21 6:18 GMT+02:00 Sriskandarajah Suhothayan < >>>>>>>>>>> s...@wso2.com>: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Aug 20, 2014 at 1:36 PM, Niranda Perera < >>>>>>>>>>>> nira...@wso2.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> @Maninda, >>>>>>>>>>>>> >>>>>>>>>>>>> +1 for suggesting Spark SQL. >>>>>>>>>>>>> >>>>>>>>>>>>> Quote Databricks, >>>>>>>>>>>>> "Spark SQL provides state-of-the-art SQL performance and >>>>>>>>>>>>> maintains compatibility with Shark/Hive. In particular, like >>>>>>>>>>>>> Shark, Spark >>>>>>>>>>>>> SQL supports all existing Hive data formats, user-defined >>>>>>>>>>>>> functions (UDF), >>>>>>>>>>>>> and the Hive metastore." [1] >>>>>>>>>>>>> >>>>>>>>>>>>> But I am not entirely sure if Spark SQL and Siddhi is >>>>>>>>>>>>> comparable, because SparkSQL (like Hive) is designed for batch >>>>>>>>>>>>> processing, >>>>>>>>>>>>> where as Siddhi is real-time processing. But if there are >>>>>>>>>>>>> implementations >>>>>>>>>>>>> where Siddhi is run on top of Spark, it would be very interesting. >>>>>>>>>>>>> >>>>>>>>>>>> Yes Siddhi's current way of operation does not support this. >>>>>>>>>>>> But with partitions and we can achieve this to some extent. >>>>>>>>>>>> >>>>>>>>>>>> Suho >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Spark supports either Hadoop1 or 2. But I think we should see, >>>>>>>>>>>>> what is best, MR1 or YARN+MR2 >>>>>>>>>>>>> >>>>>>>>>>>>> [image: Hadoop Architecture] >>>>>>>>>>>>> [2] >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html >>>>>>>>>>>>> [2] http://www.tomsitpro.com/articles/hadoop-2-vs-1,2-718.html >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Aug 20, 2014 at 1:13 PM, Lasantha Fernando < >>>>>>>>>>>>> lasan...@wso2.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Maninda, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 20 August 2014 12:02, Maninda Edirisooriya < >>>>>>>>>>>>>> mani...@wso2.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> In the case of discontinuity of Shark project, IMO we should >>>>>>>>>>>>>>> not move to Shark at all. >>>>>>>>>>>>>>> And it seems better to go with Spark SQL as we are already >>>>>>>>>>>>>>> using Spark for CEP. But I am not sure the difference between >>>>>>>>>>>>>>> Spark SQL and >>>>>>>>>>>>>>> the Siddhi queries on the Spark engine. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Currently, we are doing integration with CEP using Apache >>>>>>>>>>>>>> Storm, not Spark... :-). Spark Streaming is a possible candidate >>>>>>>>>>>>>> for >>>>>>>>>>>>>> integrating with CEP, but we have opted with Storm. I think >>>>>>>>>>>>>> there has been >>>>>>>>>>>>>> some independent work on integrating Kafka + Spark Streaming + >>>>>>>>>>>>>> Siddhi. >>>>>>>>>>>>>> Please refer to thread on arch@ "[Architecture] A few >>>>>>>>>>>>>> questions about WSO2 CEP/Siddhi" >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> And we have to figure out how Spark SQL is used for >>>>>>>>>>>>>>> historical data, whether it can execute incremental processing >>>>>>>>>>>>>>> by default >>>>>>>>>>>>>>> which will implement all out existing BAM use cases. >>>>>>>>>>>>>>> On the other hand in Hadoop 2 [1] they are using a >>>>>>>>>>>>>>> completely different platform for resource allocation known as >>>>>>>>>>>>>>> Yarn. >>>>>>>>>>>>>>> Sometimes this may be more suitable for batch jobs. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] https://www.youtube.com/watch?v=RncoVN0l6dc >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Lasantha >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Maninda Edirisooriya* >>>>>>>>>>>>>>> Senior Software Engineer >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *WSO2, Inc. *lean.enterprise.middleware. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Blog* : http://maninda.blogspot.com/ >>>>>>>>>>>>>>> *E-mail* : mani...@wso2.com >>>>>>>>>>>>>>> *Skype* : @manindae >>>>>>>>>>>>>>> *Twitter* : @maninda >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Aug 20, 2014 at 11:33 AM, Niranda Perera < >>>>>>>>>>>>>>> nira...@wso2.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Anjana and Srinath, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> After the discussion I had with Anjana, I researched more >>>>>>>>>>>>>>>> on the continuation of Shark project by Databricks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here's what I found out, >>>>>>>>>>>>>>>> - Shark was built on the Hive codebase and achieved >>>>>>>>>>>>>>>> performance improvements by swapping out the physical >>>>>>>>>>>>>>>> execution engine part >>>>>>>>>>>>>>>> of Hive. While this approach enabled Shark users to speed up >>>>>>>>>>>>>>>> their Hive >>>>>>>>>>>>>>>> queries, Shark inherited a large, complicated code base from >>>>>>>>>>>>>>>> Hive that made >>>>>>>>>>>>>>>> it hard to optimize and maintain. >>>>>>>>>>>>>>>> Hence, Databricks has announced that they are halting the >>>>>>>>>>>>>>>> development of Shark from July, 2014. (Shark 0.9 would be the >>>>>>>>>>>>>>>> last release) >>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>> - Shark will be replaced by Spark SQL. It beats Shark in TPC-DS >>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>> <http://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html> >>>>>>>>>>>>>>>> by almost an order of magnitude. It also supports all existing >>>>>>>>>>>>>>>> Hive data >>>>>>>>>>>>>>>> formats, user-defined functions (UDF), and the Hive metastore. >>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>> - Following is the Shark, Spark SQL migration plan >>>>>>>>>>>>>>>> http://spark-summit.org/wp-content/uploads/2014/07/Future-of-Spark-Patrick-Wendell.pdf >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - For the legacy Hive and MapReduce users, they have >>>>>>>>>>>>>>>> proposed a new 'Hive on Spark Project' [3], [4] >>>>>>>>>>>>>>>> But, given the performance enhancement, it is quite certain >>>>>>>>>>>>>>>> that Hive and MR would be replaced by engines build on top of >>>>>>>>>>>>>>>> Spark (ex: >>>>>>>>>>>>>>>> Spark SQL) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In my opinion there are a few matters to figure out if we >>>>>>>>>>>>>>>> are migrating from Hive, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. whether we are changing the query engine only? (Then, we >>>>>>>>>>>>>>>> can replace Hive by Shark) >>>>>>>>>>>>>>>> 2. whether we are changing the existing Hadoop/ MapReduce >>>>>>>>>>>>>>>> framework to Spark? (Then we can replace Hive and Hadoop with >>>>>>>>>>>>>>>> Spark and >>>>>>>>>>>>>>>> Spark SQL) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In my opinion, considering the longterm impact and the >>>>>>>>>>>>>>>> availability of support, it is best to migrate the Hive/Hadoop >>>>>>>>>>>>>>>> to Spark. >>>>>>>>>>>>>>>> It is open for discussion! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In the mean time, I've already tried Spark SQL, and >>>>>>>>>>>>>>>> Databricks claims on improved performance seems to be true. I >>>>>>>>>>>>>>>> will work >>>>>>>>>>>>>>>> more on this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>> http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html >>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>> http://databricks.com/blog/2014/06/02/exciting-performance-improvements-on-the-horizon-for-spark-sql.html >>>>>>>>>>>>>>>> [3] https://issues.apache.org/jira/browse/HIVE-7292 >>>>>>>>>>>>>>>> [4] >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 12:16 PM, Anjana Fernando < >>>>>>>>>>>>>>>> anj...@wso2.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Srinath, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> No, this has not been tested in multiple nodes. I told >>>>>>>>>>>>>>>>> Niranda here in my last mail, to test a cluster with the same >>>>>>>>>>>>>>>>> set of >>>>>>>>>>>>>>>>> hardware we have, that we are using to test our large data >>>>>>>>>>>>>>>>> set with Hive. >>>>>>>>>>>>>>>>> As for the effort to make the change, we still have to figure >>>>>>>>>>>>>>>>> out the MT >>>>>>>>>>>>>>>>> aspects of Shark here. Sinthuja was working on making the >>>>>>>>>>>>>>>>> latest Hive >>>>>>>>>>>>>>>>> version MT ready, and most probably, we can do the same >>>>>>>>>>>>>>>>> changes to the Hive >>>>>>>>>>>>>>>>> version Shark is using. So after we do that, the integration >>>>>>>>>>>>>>>>> should be >>>>>>>>>>>>>>>>> seamless. And also, as I mentioned earlier here, we are also >>>>>>>>>>>>>>>>> going to test >>>>>>>>>>>>>>>>> this with the APIM Hive script, to check if there are any >>>>>>>>>>>>>>>>> unforeseen >>>>>>>>>>>>>>>>> incompatibilities. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>> Anjana. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Aug 14, 2014 at 11:53 AM, Srinath Perera < >>>>>>>>>>>>>>>>> srin...@wso2.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This look great. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We need to test Spark with multiple nodes? Did we do >>>>>>>>>>>>>>>>>> that. Please create few VMs in performance could (talk to >>>>>>>>>>>>>>>>>> Lakmal) and test >>>>>>>>>>>>>>>>>> with at least 5 nodes. We need to make sure it works OK with >>>>>>>>>>>>>>>>>> distributed >>>>>>>>>>>>>>>>>> setup as well. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> What does it take to change to spark? Anjana .. how much >>>>>>>>>>>>>>>>>> work is it? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> --Srinath >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Aug 13, 2014 at 7:06 PM, Niranda Perera < >>>>>>>>>>>>>>>>>> nira...@wso2.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you Anjana. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes, I am working on it. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> In the mean time, I found this in Hive documentation >>>>>>>>>>>>>>>>>>> [1]. It talks about Hive on Spark, and compares Hive, Shark >>>>>>>>>>>>>>>>>>> and Spark SQL >>>>>>>>>>>>>>>>>>> at an higher architectural level. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Additionally, it is said that the in-memory performance >>>>>>>>>>>>>>>>>>> of Shark can be improved by introducing Tachyon [2]. I >>>>>>>>>>>>>>>>>>> guess we can >>>>>>>>>>>>>>>>>>> consider this later on. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark#HiveonSpark-1.3ComparisonwithSharkandSparkSQL >>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>> http://tachyon-project.org/Running-Tachyon-Locally.html >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Aug 13, 2014 at 3:17 PM, Anjana Fernando < >>>>>>>>>>>>>>>>>>> anj...@wso2.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Niranda, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Excellent analysis of Hive vs Shark! .. This gives a >>>>>>>>>>>>>>>>>>>> lot of insight into how both operates in different >>>>>>>>>>>>>>>>>>>> scenarios. As the next >>>>>>>>>>>>>>>>>>>> step, we will need to run this in an actual cluster of >>>>>>>>>>>>>>>>>>>> computers. Since >>>>>>>>>>>>>>>>>>>> you've used a subset of the dataset of 2014 DEBS >>>>>>>>>>>>>>>>>>>> challenge, we should use >>>>>>>>>>>>>>>>>>>> the full data set in a clustered environment and check >>>>>>>>>>>>>>>>>>>> this. Gokul is >>>>>>>>>>>>>>>>>>>> already working on the Hive based setup for this, after >>>>>>>>>>>>>>>>>>>> that is done, you >>>>>>>>>>>>>>>>>>>> can create a Shark cluster in the same hardware and run >>>>>>>>>>>>>>>>>>>> the tests there, to >>>>>>>>>>>>>>>>>>>> get a clear comparison on how these two match up in a >>>>>>>>>>>>>>>>>>>> cluster. Until the >>>>>>>>>>>>>>>>>>>> setup is ready, do continue with your next steps on >>>>>>>>>>>>>>>>>>>> checking the RDD >>>>>>>>>>>>>>>>>>>> support and Spark SQL use. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> After these are done, we should also do a trial run of >>>>>>>>>>>>>>>>>>>> our own APIM Hive scripts, migrated to Shark. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> Anjana. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Mon, Aug 11, 2014 at 12:21 PM, Niranda Perera < >>>>>>>>>>>>>>>>>>>> nira...@wso2.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I have been evaluating the performance of >>>>>>>>>>>>>>>>>>>>> Shark (distributed SQL query engine for Hadoop) against >>>>>>>>>>>>>>>>>>>>> Hive. This is with >>>>>>>>>>>>>>>>>>>>> the objective of seeing the possibility to move the WSO2 >>>>>>>>>>>>>>>>>>>>> BAM data >>>>>>>>>>>>>>>>>>>>> processing (which currently uses Hive) to Shark (and >>>>>>>>>>>>>>>>>>>>> Apache Spark) for >>>>>>>>>>>>>>>>>>>>> improved performance. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I am sharing my findings herewith. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> *AMP Lab Shark* >>>>>>>>>>>>>>>>>>>>> Shark can execute Hive QL queries up to 100 times >>>>>>>>>>>>>>>>>>>>> faster than Hive without any modification to the existing >>>>>>>>>>>>>>>>>>>>> data or queries. >>>>>>>>>>>>>>>>>>>>> It supports Hive's QL, metastore, serialization formats, >>>>>>>>>>>>>>>>>>>>> and user-defined >>>>>>>>>>>>>>>>>>>>> functions, providing seamless integration with existing >>>>>>>>>>>>>>>>>>>>> Hive deployments >>>>>>>>>>>>>>>>>>>>> and a familiar, more powerful option for new ones. [1] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> *Apache Spark*Apache Spark is an open-source data >>>>>>>>>>>>>>>>>>>>> analytics cluster computing framework. It fits into the >>>>>>>>>>>>>>>>>>>>> Hadoop open-source >>>>>>>>>>>>>>>>>>>>> community, building on top of the HDFS and promises >>>>>>>>>>>>>>>>>>>>> performance up to 100 >>>>>>>>>>>>>>>>>>>>> times faster than Hadoop MapReduce for certain >>>>>>>>>>>>>>>>>>>>> applications. [2] >>>>>>>>>>>>>>>>>>>>> Official documentation: [3] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I carried out the comparison between the following >>>>>>>>>>>>>>>>>>>>> Hive and Shark releases with input files ranging from 100 >>>>>>>>>>>>>>>>>>>>> to 1 billion >>>>>>>>>>>>>>>>>>>>> entries. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> QL Engine >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Apache Hive 0.11 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Shark Shark 0.9.1 (Latest release) which uses, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Scala 2.10.3 >>>>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Spark 0.9.1 >>>>>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> AMPLab’s Hive 0.9.0 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Framework >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hadoop 1.0.4 >>>>>>>>>>>>>>>>>>>>> Spark 0.9.1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> File system >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> HDFS >>>>>>>>>>>>>>>>>>>>> HDFS >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Attached herewith is a report which describes in >>>>>>>>>>>>>>>>>>>>> detail about the performance comparison between Shark and >>>>>>>>>>>>>>>>>>>>> Hive. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> hive_vs_shark >>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/a/wso2.com/folderview?id=0B1GsnfycTl32QTZqUktKck1Ucjg&usp=drive_web> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> hive_vs_shark_report.odt >>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/a/wso2.com/file/d/0B1GsnfycTl32X3J5dTh6Slloa0E/edit?usp=drive_web> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> In summary, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> From the evaluation, following conclusions can be >>>>>>>>>>>>>>>>>>>>> derived. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> - Shark is indifferent to Hive in DDL operations >>>>>>>>>>>>>>>>>>>>> (CREATE, DROP .. TABLE, DATABASE). Both engines show a >>>>>>>>>>>>>>>>>>>>> fairly constant >>>>>>>>>>>>>>>>>>>>> performance as the input size increases. >>>>>>>>>>>>>>>>>>>>> - Shark is indifferent to Hive in DML operations >>>>>>>>>>>>>>>>>>>>> (LOAD, INSERT) but when a DML operation is called in >>>>>>>>>>>>>>>>>>>>> conjuncture of a data >>>>>>>>>>>>>>>>>>>>> retrieval operation (ex. INSERT <TBL> SELECT <PROP> >>>>>>>>>>>>>>>>>>>>> FROM <TBL>), Shark >>>>>>>>>>>>>>>>>>>>> significantly over-performs Hive with a performance >>>>>>>>>>>>>>>>>>>>> factor of 10x+ (Ranging >>>>>>>>>>>>>>>>>>>>> from 10x to 80x in some instances). Shark performance >>>>>>>>>>>>>>>>>>>>> factor reduces with >>>>>>>>>>>>>>>>>>>>> the input size increases, while HIVE performance is >>>>>>>>>>>>>>>>>>>>> fairly indifferent. >>>>>>>>>>>>>>>>>>>>> - Shark clearly over-performs Hive in Data >>>>>>>>>>>>>>>>>>>>> Retrieval operations (FILTER, ORDER BY, JOIN). Hive >>>>>>>>>>>>>>>>>>>>> performance is fairly >>>>>>>>>>>>>>>>>>>>> indifferent in the data retrieval operations while >>>>>>>>>>>>>>>>>>>>> Shark performance >>>>>>>>>>>>>>>>>>>>> reduces as the input size increases. But at every >>>>>>>>>>>>>>>>>>>>> instance Shark >>>>>>>>>>>>>>>>>>>>> over-performed Hive with a minimum performance factor >>>>>>>>>>>>>>>>>>>>> of 5x+ (Ranging from >>>>>>>>>>>>>>>>>>>>> 5x to 80x in some instances). >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Please refer the 'hive_vs_shark_report', it has all >>>>>>>>>>>>>>>>>>>>> the information about the queries and timings >>>>>>>>>>>>>>>>>>>>> pictographically. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The code repository can also be found in >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> https://github.com/nirandaperera/hiveToShark/tree/master/hiveVsShark >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Moving forward, I am currently working on the >>>>>>>>>>>>>>>>>>>>> following. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> - Apache Spark's resilient distributed dataset >>>>>>>>>>>>>>>>>>>>> (RDD) abstraction (which is a collection of elements >>>>>>>>>>>>>>>>>>>>> partitioned across the >>>>>>>>>>>>>>>>>>>>> nodes of the cluster that can be operated on in >>>>>>>>>>>>>>>>>>>>> parallel). The use of RDDs >>>>>>>>>>>>>>>>>>>>> and its impact to the performance. >>>>>>>>>>>>>>>>>>>>> - Spark SQL - Use of this Spark SQL over Shark on >>>>>>>>>>>>>>>>>>>>> Spark framework >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [1] https://github.com/amplab/shark/wiki >>>>>>>>>>>>>>>>>>>>> [2] http://en.wikipedia.org/wiki/Apache_Spark >>>>>>>>>>>>>>>>>>>>> [3] http://spark.apache.org/docs/latest/ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Would love to have your feedback on this. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Best regards >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> *Niranda Perera* >>>>>>>>>>>>>>>>>>>>> Software Engineer, WSO2 Inc. >>>>>>>>>>>>>>>>>>>>> Mobile: +94-71-554-8430 >>>>>>>>>>>>>>>>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> *Anjana Fernando* >>>>>>>>>>>>>>>>>>>> Senior Technical Lead >>>>>>>>>>>>>>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>>>>>>>>>>>>>> lean . enterprise . middleware >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> *Niranda Perera* >>>>>>>>>>>>>>>>>>> Software Engineer, WSO2 Inc. >>>>>>>>>>>>>>>>>>> Mobile: +94-71-554-8430 >>>>>>>>>>>>>>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> ============================ >>>>>>>>>>>>>>>>>> Srinath Perera, Ph.D. >>>>>>>>>>>>>>>>>> http://people.apache.org/~hemapani/ >>>>>>>>>>>>>>>>>> http://srinathsview.blogspot.com/ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> *Anjana Fernando* >>>>>>>>>>>>>>>>> Senior Technical Lead >>>>>>>>>>>>>>>>> WSO2 Inc. | http://wso2.com >>>>>>>>>>>>>>>>> lean . enterprise . middleware >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> *Niranda Perera* >>>>>>>>>>>>>>>> Software Engineer, WSO2 Inc. >>>>>>>>>>>>>>>> Mobile: +94-71-554-8430 >>>>>>>>>>>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> *Lasantha Fernando* >>>>>>>>>>>>>> Software Engineer - Data Technologies Team >>>>>>>>>>>>>> WSO2 Inc. http://wso2.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> email: lasan...@wso2.com >>>>>>>>>>>>>> mobile: (+94) 71 5247551 >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> *Niranda Perera* >>>>>>>>>>>>> Software Engineer, WSO2 Inc. >>>>>>>>>>>>> Mobile: +94-71-554-8430 >>>>>>>>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> *S. Suhothayan* >>>>>>>>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>>>>>>>>>>> *WSO2 Inc. *http://wso2.com >>>>>>>>>>>> * <http://wso2.com/>* >>>>>>>>>>>> lean . enterprise . middleware >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | >>>>>>>>>>>> blog: http://suhothayan.blogspot.com/ >>>>>>>>>>>> <http://suhothayan.blogspot.com/> >>>>>>>>>>>> twitter: http://twitter.com/suhothayan >>>>>>>>>>>> <http://twitter.com/suhothayan> | >>>>>>>>>>>> linked-in: http://lk.linkedin.com/in/suhothayan >>>>>>>>>>>> <http://lk.linkedin.com/in/suhothayan>* >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Architecture mailing list >>>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Architecture mailing list >>>>>>>>>>> Architecture@wso2.org >>>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ============================ >>>>>>>>>> Srinath Perera, Ph.D. >>>>>>>>>> http://people.apache.org/~hemapani/ >>>>>>>>>> http://srinathsview.blogspot.com/ >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Architecture mailing list >>>>>>>>>> Architecture@wso2.org >>>>>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> *Niranda Perera* >>>>>>>>> Software Engineer, WSO2 Inc. >>>>>>>>> Mobile: +94-71-554-8430 >>>>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Niranda Perera* >>>>>>> Software Engineer, WSO2 Inc. >>>>>>> Mobile: +94-71-554-8430 >>>>>>> Twitter: @n1r44 <https://twitter.com/N1R44> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Niranda Perera* >>>>> Software Engineer, WSO2 Inc. >>>>> Mobile: +94-71-554-8430 >>>>> Twitter: @n1r44 <https://twitter.com/N1R44> >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> David Morales de Frías :: +34 607 010 411 :: @dmoralesdf >>>> <https://twitter.com/dmoralesdf> >>>> >>>> >>>> <http://www.stratio.com/> >>>> Avenida de Europa, 26. Ática 5. 2ª Planta >>>> 28224 Pozuelo de Alarcón, Madrid >>>> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>* >>>> >>> >>> >>> -- >>> *Niranda Perera* >>> Software Engineer, WSO2 Inc. >>> Mobile: +94-71-554-8430 >>> Twitter: @n1r44 <https://twitter.com/N1R44> >>> >> >> >> -- >> >> David Morales de Frías :: +34 607 010 411 :: @dmoralesdf >> <https://twitter.com/dmoralesdf> >> >> >> <http://www.stratio.com/> >> Avenida de Europa, 26. Ática 5. 2ª Planta >> 28224 Pozuelo de Alarcón, Madrid >> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>* >> > > > -- > *Niranda Perera* > Software Engineer, WSO2 Inc. > Mobile: +94-71-554-8430 > Twitter: @n1r44 <https://twitter.com/N1R44> > -- David Morales de Frías :: +34 607 010 411 :: @dmoralesdf <https://twitter.com/dmoralesdf> <http://www.stratio.com/> Avenida de Europa, 26. Ática 5. 2ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture