We also did some benchmarking using analytical queries similar to TPC-H
both with Spark and Presto, and our conclussion was that Spark is a great
general solution but for analytical SQL queries it is still not there yet.
I mean for 10 or 100GB of data you will get your results back but with
Presto
It's only really mildly interactive. When I used presto+hive in the past
(just a consumer not an admin) it seemed to be able to provide answers
within ~2m even for fairly large data sets. Hoping I can get a similar
level of responsiveness with spark.
Thanks, Sonal! I'll take a look at the example
Do you want to replace ELK by Spark? Depending on your queries you could do as
you proposed. However, many of the text analytics queries will probably be much
faster on ELK. If your queries are more interactive and not about batch
processing then it does not make so much sense. I am not sure
Hi Mat,
I think you could also use spark SQL to query the logs. Hope the following
link helps
https://databricks.com/blog/2014/09/23/databricks-reference-applications.html
On May 23, 2016 10:59 AM, "Mat Schaffer" wrote:
> I'm curious about trying to use spark as a cheap/slow
I'm curious about trying to use spark as a cheap/slow ELK
(ElasticSearch,Logstash,Kibana) system. Thinking something like:
- instances rotate local logs
- copy rotated logs to s3
(s3://logs/region/grouping/instance/service/*.logs)
- spark to convert from raw text logs to parquet
- maybe presto to