from:"Ashish Mukherjee"

Re: Data Processing speed SQL Vs SPARK

2015-07-13 Thread Ashish Mukherjee

MySQL and PgSQL scale to millions. Spark or any distributed/clustered computing environment would be inefficient for the kind of data size you mention. That's because of coordination of processes, moving data around etc. On Mon, Jul 13, 2015 at 5:34 PM, Sandeep Giri sand...@knowbigdata.com wrote:

RDD staleness

2015-05-31 Thread Ashish Mukherjee

Hello, Since RDDs are created from data from Hive tables or HDFS, how do we ensure they are invalidated when the source data is updated? Regards, Ashish

Re: Spark SQL v MemSQL/Voltdb

2015-05-28 Thread Ashish Mukherjee

supports queries on the same(custom to voltdb) store. Spark(SQL) is NOT suitable for transactions; it is designed for querying immutable data (which may exist in several different forms of stores). On May 28, 2015, at 7:48 AM, Ashish Mukherjee ashish.mukher...@gmail.com wrote: Hello, I

Spark SQL v MemSQL/Voltdb

2015-05-28 Thread Ashish Mukherjee

Hello, I was wondering if there is any documented comparison of SparkSQL with MemSQL/VoltDB kind of in-memory SQL databases. MemSQL etc. too allow queries to be run in a clustered environment. What is the major differentiation? Regards, Ashish

Spark SQL and DataSources API roadmap

2015-03-27 Thread Ashish Mukherjee

Hello, Is there any published community roadmap for SparkSQL and the DataSources API? Regards, Ashish

Spark as a service

2015-03-24 Thread Ashish Mukherjee

Hello, As of now, if I have to execute a Spark job, I need to create a jar and deploy it. If I need to run a dynamically formed SQL from a Web application, is there any way of using SparkSQL in this manner? Perhaps, through a Web Service or something similar. Regards, Ashish

Re: Question about Data Sources API

2015-03-24 Thread Ashish Mukherjee

with grouping and sorting. Essentially, I am trying to evaluate if this API can give me much of what is possible with the Apache MetaModel project. Regards, Ashish On Tue, Mar 24, 2015 at 1:57 PM, Michael Armbrust mich...@databricks.com wrote: On Tue, Mar 24, 2015 at 12:57 AM, Ashish Mukherjee

Question about Data Sources API

2015-03-24 Thread Ashish Mukherjee

Hello, I have some questions related to the Data Sources API - 1. Is the Data Source API stable as of Spark 1.3.0? 2. The Data Source API seems to be available only in Scala. Is there any plan to make it available for Java too? 3. Are only filters and projections pushed down to the data

Spark with data on NFS v HDFS

2015-03-05 Thread Ashish Mukherjee

Hello, I understand Spark can be used with Hadoop or standalone. I have certain questions related to use of the correct FS for Spark data. What is the efficiency trade-off in feeding data to Spark from NFS v HDFS? If one is not using Hadoop, is it still usual to house data in HDFS for Spark to

SparkSQL production readiness

2015-02-28 Thread Ashish Mukherjee

Hi, I am exploring SparkSQL for my purposes of performing large relational operations across a cluster. However, it seems to be in alpha right now. Is there any indication when it would be considered production-level? I don't see any info on the site. Regards, Ashish

Running in-memory SQL on streamed relational data

2015-02-28 Thread Ashish Mukherjee

Hi, I have been looking at Spark Streaming , which seems to be for the use case of live streams which are processed one line at a time generally in real-time. Since SparkSQL reads data from some filesystem, I was wondering if there is something which connects SparkSQL with Spark Streaming, so I

Spark Distributed Join

2015-02-13 Thread Ashish Mukherjee

Hello, I have the following scenario and was wondering if I can use Spark to address it. I want to query two different data stores (say, ElasticSearch and MySQL) and then merge the two result sets based on a join key between the two. Is it appropriate to use Spark to do this join, if the

Re: Data Processing speed SQL Vs SPARK

RDD staleness

Re: Spark SQL v MemSQL/Voltdb

Spark SQL v MemSQL/Voltdb

Spark SQL and DataSources API roadmap

Spark as a service

Re: Question about Data Sources API

Question about Data Sources API

Spark with data on NFS v HDFS

SparkSQL production readiness

Running in-memory SQL on streamed relational data

Spark Distributed Join

12 matches

Site Navigation

Mail list logo

Footer information