Re: Spark 1.3.0: how to let Spark history load old records?
I think Spark doesn't keep historical metrics. You can use something like SPM for that - http://blog.sematext.com/2014/01/30/announcement-apache-storm-monitoring-in-spm/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Mon, Jun 1, 2015 at 11:36 PM, Haopu Wang hw...@qilinsoft.com wrote: When I start the Spark master process, the old records are not shown in the monitoring UI. How to show the old records? Thank you very much! - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to monitor Spark Streaming from Kafka?
I think you can use SPM - http://sematext.com/spm - it will give you all Spark and all Kafka metrics, including offsets broken down by topic, etc. out of the box. I see more and more people using it to monitor various components in data processing pipelines, a la http://blog.sematext.com/2015/04/22/monitoring-stream-processing-tools-cassandra-kafka-and-spark/ Otis On Mon, Jun 1, 2015 at 5:23 PM, dgoldenberg dgoldenberg...@gmail.com wrote: Hi, What are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume that Receiver-based streaming is used? Then Note that one disadvantage of this approach (Receiverless Approach, #2) is that it does not update offsets in Zookeeper, hence Zookeeper-based Kafka monitoring tools will not show progress. However, you can access the offsets processed by this approach in each batch and update Zookeeper yourself. The code sample, however, seems sparse. What do you need to do here? - directKafkaStream.foreachRDD( new FunctionJavaPairRDDlt;String, String, Void() { @Override public Void call(JavaPairRDDString, Integer rdd) throws IOException { OffsetRange[] offsetRanges = ((HasOffsetRanges)rdd).offsetRanges // offsetRanges.length = # of Kafka partitions being consumed ... return null; } } ); and if these are updated, will KafkaOffsetMonitor work? Monitoring seems to center around the notion of a consumer group. But in the receiverless approach, code on the Spark consumer side doesn't seem to expose a consumer group parameter. Where does it go? Can I/should I just pass in group.id as part of the kafkaParams HashMap? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-monitor-Spark-Streaming-from-Kafka-tp23103.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: RE: ElasticSearch for Spark times out
Hi, If you get ES response back in 1-5 seconds that's pretty slow. Are these ES aggregation queries? Costin may be right about GC possibly causing timeouts. SPM http://sematext.com/spm/ can give you all Spark and all key Elasticsearch metrics, including various JVM metrics. If the problem is GC, you'll see it. If you monitor both Spark side and ES side, you should be able to find some correlation with SPM. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Apr 22, 2015 at 5:43 PM, Costin Leau costin.l...@gmail.com wrote: Hi, First off, for Elasticsearch questions is worth pinging the Elastic mailing list as that is closer monitored than this one. Back to your question, Jeetendra is right that the exception indicates nodata is flowing back to the es-connector and Spark. The default is 1m [1] which should be more than enough for a typical scenario. As a side note the scroll size is 50 per tasks (so 150 suggests 3 tasks). Once the query is made, scrolling the document is fast - likely there's something else at hand that causes the connection to timeout. In such cases, you can enable logging on the REST package and see what type of data transfer occurs between ES and Spark. Do note that if a GC occurs, that can freeze Elastic (or Spark) which might trigger the timeout. Consider monitoring Elasticsearch during the query and see whether anything jumps - in particular the memory pressure. Hope this helps, [1] http://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html#_network On 4/22/15 10:44 PM, Adrian Mocanu wrote: Hi Thanks for the help. My ES is up. Out of curiosity, do you know what the timeout value is? There are probably other things happening to cause the timeout; I don’t think my ES is that slow but it’s possible that ES is taking too long to find the data. What I see happening is that it uses scroll to get the data from ES; about 150 items at a time.Usual delay when I perform the same query from a browser plugin ranges from 1-5sec. Thanks *From:*Jeetendra Gangele [mailto:gangele...@gmail.com] *Sent:* April 22, 2015 3:09 PM *To:* Adrian Mocanu *Cc:* u...@spark.incubator.apache.org *Subject:* Re: ElasticSearch for Spark times out Basically ready timeout means hat no data arrived within the specified receive timeout period. Few thing I would suggest 1.are your ES cluster Up and running? 2. if 1 is yes then reduce the size of the Index make it few kbps and then test? On 23 April 2015 at 00:19, Adrian Mocanu amoc...@verticalscope.com mailto:amoc...@verticalscope.com wrote: Hi I use the ElasticSearch package for Spark and very often it times out reading data from ES into an RDD. How can I keep the connection alive (why doesn’t it? Bug?) Here’s the exception I get: org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: java.net.SocketTimeoutException: Read timed out at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:86) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.serialization.ParsingUtils.doSeekToken(ParsingUtils.java:70) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.serialization.ParsingUtils.seek(ParsingUtils.java:58) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:149) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:102) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:81) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:314) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46) ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3] at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) ~[scala-library.jar:na] at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) ~[scala-library.jar:na] at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) ~[scala-library.jar:na] at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) ~[scala-library.jar:na] at scala.collection.Iterator$class.foreach(Iterator.scala:727)
Re: Spark @ EC2: Futures timed out Ask timed out
Hi Akhil, Thanks! I think that was it. Had to open a bunch of ports (didn't use spark-ec2, so it didn't do that for me) and the app works fine now. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Mar 17, 2015 at 3:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Did you launch the cluster using spark-ec2 script? Just make sure all ports are open for master, slave instances security group. From the error, it seems its not able to connect to the driver program (port 58360) Thanks Best Regards On Tue, Mar 17, 2015 at 3:26 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I've been trying to run a simple SparkWordCount app on EC2, but it looks like my apps are not succeeding/completing. I'm suspecting some sort of communication issue. I used the SparkWordCount app from http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/ Digging through logs I found this: 15/03/16 21:28:20 INFO Utils: Successfully started service 'driverPropsFetcher' on port 58123. Exception in thread main java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1563) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:115) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) * Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] * at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:127) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) ... 4 more Or exceptions like: *Caused by: akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://sparkDriver@ip-10-111-222-111.ec2.internal:58360/), Path(/user/CoarseGrainedScheduler)]] after [3 ms] * at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) at java.lang.Thread.run(Thread.java:745) This is in EC2 and I have ports 22, 7077, 8080, and 8081 open to any source. But maybe I need to do something, too? I do see Master sees Workers and Workers do connect to the Master. I did run this in spark-shell, and it runs without problems; scala val something = sc.parallelize(1 to 1000).collect().filter(_1000 This is how I submitted the job (on the Master machine): $ spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --executor-memory 256m --master spark://ip-10-171-32-62:7077 wc-spark/target/sparkwordcount-0.0.1-SNAPSHOT.jar /usr/share/dict/words 0 Any help would be greatly appreciated. Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Spark @ EC2: Futures timed out Ask timed out
Hi, I've been trying to run a simple SparkWordCount app on EC2, but it looks like my apps are not succeeding/completing. I'm suspecting some sort of communication issue. I used the SparkWordCount app from http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/ Digging through logs I found this: 15/03/16 21:28:20 INFO Utils: Successfully started service 'driverPropsFetcher' on port 58123. Exception in thread main java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1563) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:115) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) * Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] * at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:127) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) ... 4 more Or exceptions like: *Caused by: akka.pattern.AskTimeoutException: Ask timed out on [ActorSelection[Anchor(akka.tcp://sparkDriver@ip-10-111-222-111.ec2.internal:58360/), Path(/user/CoarseGrainedScheduler)]] after [3 ms] * at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) at java.lang.Thread.run(Thread.java:745) This is in EC2 and I have ports 22, 7077, 8080, and 8081 open to any source. But maybe I need to do something, too? I do see Master sees Workers and Workers do connect to the Master. I did run this in spark-shell, and it runs without problems; scala val something = sc.parallelize(1 to 1000).collect().filter(_1000 This is how I submitted the job (on the Master machine): $ spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --executor-memory 256m --master spark://ip-10-171-32-62:7077 wc-spark/target/sparkwordcount-0.0.1-SNAPSHOT.jar /usr/share/dict/words 0 Any help would be greatly appreciated. Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: throughput in the web console?
Hi Josh, SPM will show you this info. I see you use Kafka, too, whose numerous metrics you can also see in SPM side by side with your Spark metrics. Sounds like trends is what you are after, so I hope this helps. See http://sematext.com/spm Otis On Feb 24, 2015, at 11:59, Josh J joshjd...@gmail.com wrote: Hi, I plan to run a parameter search varying the number of cores, epoch, and parallelism. The web console provides a way to archive the previous runs, though is there a way to view in the console the throughput? Rather than logging the throughput separately to the log files and correlating the logs files to the web console processing times? Thanks, Josh - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark job for demoing Spark metrics monitoring?
Hi, I'll be showing our Spark monitoring http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ at the upcoming Spark Summit in NYC. I'd like to run some/any Spark job that really exercises Spark and makes it emit all its various metrics (so the metrics charts are full of data and not blank or flat and boring). Since we don't use Spark at Sematext yet, I was wondering if anyone could recommend some Spark app/job that's easy to run, just to get some Spark job to start emitting various Spark metrics? Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: monitoring for spark standalone
Hi Judy, SPM monitors Spark. Here are some screenshots: http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Mon, Dec 8, 2014 at 2:35 AM, Judy Nash judyn...@exchange.microsoft.com wrote: Hello, Are there ways we can programmatically get health status of master slave nodes, similar to Hadoop Ambari? Wiki seems to suggest there are only web UI or instrumentations ( http://spark.apache.org/docs/latest/monitoring.html). Thanks, Judy
Re: Monitoring Spark
Hi Isca, I think SPM can do that for you: http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Tue, Dec 2, 2014 at 11:57 PM, Isca Harmatz pop1...@gmail.com wrote: hello, im running spark on a cluster and i want to monitor how many nodes/ cores are active in different (specific) points of the program. is there any way to do this? thanks, Isca
[ANN] Spark resources searchable
Hi everyone, We've recently added indexing of all Spark resources to http://search-hadoop.com/spark . Everything is nicely searchable: * user dev mailing lists * JIRA issues * web site * wiki * source code * javadoc. Maybe it's worth adding to http://spark.apache.org/community.html ? Enjoy! Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/
Re: Measuring Performance in Spark
Hi Mahsa, Use SPM http://sematext.com/spm/. See http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ . Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, Oct 31, 2014 at 1:00 PM, mahsa mahsa.han...@gmail.com wrote: Is there any tools like Ganglia that I can use to get performance on Spark or I need to do it myself? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Measuring-Performance-in-Spark-tp17376p17836.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark Monitoring with Ganglia
Hi, If using Ganglia is not an absolute requirement, check out SPM http://sematext.com/spm/ for Spark -- http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ It monitors all Spark metrics (i.e. you don't need to figure out what you need to monitor, how to get it, how to graph it, etc.) and has alerts and anomaly detection built in.. If you use Spark with Hadoop, Kafka, Cassandra, HBase, Elasticsearch SPM monitors them, too, so you can have visibility into all your tech in one place. You can send Spark event logs to Logsene http://sematext.com/logsene/, too, if you want, and then you can have your performance and log graphs side by side. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Oct 1, 2014 at 4:30 PM, danilopds danilob...@gmail.com wrote: Hi, I need monitoring some aspects about my cluster like network and resources. Ganglia looks like a good option for what I need. Then, I found out that Spark has support to Ganglia. On the Spark monitoring webpage there is this information: To install the GangliaSink you’ll need to perform a custom build of Spark. I found in my Spark the directory: /extras/spark-ganglia-lgpl. But I don't know how to install it. How can I install the Ganglia to monitoring Spark cluster? How I do this custom build? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Monitoring-with-Ganglia-tp15538.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: JMXSink for YARN deployment
Hi, Jerry said I'm guessing, so maybe the thing to try is to check if his guess is correct. What about running sudo lsof | grep metrics.properties ? I imagine you should be able to see it if the file was found and read. If Jerry is right, then I think you will NOT see it. Next, how about trying some bogus value in metrics.properties, like *.sink. jmx.class=org.apache.spark.metrics.sink.*BUGUSSink*? If the file is being read then specifying such bogus value should make something log an error or throw exception at start, I assume. If you don't see this then maybe this file is not being read at all. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Sep 11, 2014 at 9:18 AM, Shao, Saisai saisai.s...@intel.com wrote: Hi, I’m guessing the problem is that driver or executor cannot get the metrics.properties configuration file in the yarn container, so metrics system cannot load the right sinks. Thanks Jerry *From:* Vladimir Tretyakov [mailto:vladimir.tretya...@sematext.com] *Sent:* Thursday, September 11, 2014 7:30 PM *To:* user@spark.apache.org *Subject:* JMXSink for YARN deployment Hello, we are in Sematext (https://apps.sematext.com/) are writing Monitoring tool for Spark and we came across one question: How to enable JMX metrics for YARN deployment? We put *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink to file $SPARK_HOME/conf/metrics.properties but it doesn't work. Everything works in Standalone mode, but not in YARN mode. Can somebody help? Thx! PS: I've found also https://stackoverflow.com/questions/23529404/spark-on-yarn-how-to-send-metrics-to-graphite-sink/25786112 without answer.
Deployment model popularity - Standard vs. YARN vs. Mesos vs. SIMR
Hi, I'm trying to determine which Spark deployment models are the most popular - Standalone, YARN, Mesos, or SIMR. Anyone knows? I thought I'm use search-hadoop.com to help me figure this out and this is what I found: 1) Standalone http://search-hadoop.com/?q=standalonefc_project=Sparkfc_type=mail+_hash_+user (seems the most popular?) 2) YARN http://search-hadoop.com/?q=yarnfc_project=Sparkfc_type=mail+_hash_+user (almost as popular as standalone?) 3) Mesos http://search-hadoop.com/?q=mesosfc_project=Sparkfc_type=mail+_hash_+user (less popular than yarn or standalone) 4) SIMR http://search-hadoop.com/?q=simrfc_project=Sparkfc_type=mail+_hash_+user (no mentions?) This is obviously not very accurate but is the order right? Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/