Hello,
See:
http://www.cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-1
Note that the various DSLs these systems use are analogous to Gremlin --- they
all use the "functional-fluent"-style. We need to stress to people that if they
use Spark/Storm/Flink/Samza/Scala/Java8/Clojure, then Gremlin fits their
already existing mental model of data flows and aggregations. When people say a
query language needs to be "like SQL," point them to the fact that most modern
data processing frameworks don't use that style. When people say that SQL is
declarative and thus can be optimized, tell them that these functional-fluent
languages build a query plan that is optimized for the underlying execution
engine. By making an "SQL language," all you are doing is making another layer
of indirection -- now you have to compile a String down to the underlying
execution language (e.g. Java). Modern data processing languages don't waste
the effort as the constructs in modern programming languages provide enough
expressivity. Moreover, these languages lead (I believe) to execution engine
designs that naturally support both single machine and compute cluster
executions (they have a map/reduce-foundation inherent in their representation).
GREMLIN
text.map(line -> line.split(" "))
.unfold()
.groupCount()
STORM
topology.newStream("spout1", spout)
.each(new Fields("sentence"),new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(),
new Count(), new Fields("count"));
SPARK
text.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
SAMZA
text.split(" ").foldLeft(Map.empty[String, Int]) {
(count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
}
FLINK
text.flatMap ( _.split(" ") )
.map ( (_, 1) )
.groupBy(0)
.sum(1)
Take care,
Marko.
http://markorodriguez.com