Right, you can't expect a completely cold first query to execute faster than the data can be retrieved from the underlying datastore. After that, lowest latency query performance is largely a matter of caching -- for which Spark provides at least partial solutions.
On Tue, Dec 1, 2015 at 4:27 PM, Michal Klos <michal.klo...@gmail.com> wrote: > You should consider presto for this use case. If you want fast "first > query" times it is a better fit. > > I think sparksql will catch up at some point but if you are not doing > multiple queries against data cached in RDDs and need low latency it may > not be a good fit. > > M > > On Dec 1, 2015, at 7:23 PM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > > Ok, so latency problem is being generated because I'm using SQL as source? > how about csv, hive, or another source? > > On Tue, Dec 1, 2015 at 9:18 PM, Mark Hamstra <m...@clearstorydata.com> > wrote: > >> It is not designed for interactive queries. >> >> >> You might want to ask the designers of Spark, Spark SQL, and particularly >> some things built on top of Spark (such as BlinkDB) about their intent with >> regard to interactive queries. Interactive queries are not the only >> designed use of Spark, but it is going too far to claim that Spark is not >> designed at all to handle interactive queries. >> >> That being said, I think that you are correct to question the wisdom of >> expecting lowest-latency query response from Spark using SQL (sic, >> presumably a RDBMS is intended) as the datastore. >> >> On Tue, Dec 1, 2015 at 4:05 PM, Jörn Franke <jornfra...@gmail.com> wrote: >> >>> Hmm it will never be faster than SQL if you use SQL as an underlying >>> storage. Spark is (currently) an in-memory batch engine for iterative >>> machine learning workloads. It is not designed for interactive queries. >>> Currently hive is going into the direction of interactive queries. >>> Alternatives are Hbase on Phoenix or Impala. >>> >>> On 01 Dec 2015, at 21:58, Andrés Ivaldi <iaiva...@gmail.com> wrote: >>> >>> Yes, >>> The use case would be, >>> Have spark in a service (I didnt invertigate this yet), through api >>> calls of this service we perform some aggregations over data in SQL, We are >>> already doing this with an internal development >>> >>> Nothing complicated, for instance, a table with Product, Product Family, >>> cost, price, etc. Columns like Dimension and Measures, >>> >>> I want to Spark for query that table to perform a kind of rollup, with >>> cost as Measure and Prodcut, Product Family as Dimension >>> >>> Only 3 columns, it takes like 20s to perform that query and the >>> aggregation, the query directly to the database with a grouping at the >>> columns takes like 1s >>> >>> regards >>> >>> >>> >>> On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <jornfra...@gmail.com> >>> wrote: >>> >>>> can you elaborate more on the use case? >>>> >>>> > On 01 Dec 2015, at 20:51, Andrés Ivaldi <iaiva...@gmail.com> wrote: >>>> > >>>> > Hi, >>>> > >>>> > I'd like to use spark to perform some transformations over data >>>> stored inSQL, but I need low Latency, I'm doing some test and I run into >>>> spark context creation and data query over SQL takes too long time. >>>> > >>>> > Any idea for speed up the process? >>>> > >>>> > regards. >>>> > >>>> > -- >>>> > Ing. Ivaldi Andres >>>> >>> >>> >>> >>> -- >>> Ing. Ivaldi Andres >>> >>> >> > > > -- > Ing. Ivaldi Andres > >