On Fri, Aug 14, 2009 at 3:10 PM, bradford
cross<bradford.n.cr...@gmail.com> wrote:
> We have just released flightcaster.com which uses statistical inference and
> machine learning to predict flight delays in advance of airlines (initial
> results appear to do so with 85 - 90 % accuracy.)
>
> The webserver and webapp are all rails running on the Heroku platform; which
> also serves our blackberry and iphone apps.
>
> The research and data heavy lifting is all in Clojure.
>
> Distributed data mining is done via a custom layer on top of cascading
> (which is a layer on top of hadoop.)  All run on EC2 and S3 using the very
> nice cloudera AMIs and deployment scripts.
>
> In addition to the machine learning, the layer atop cascading performs all
> the complex data data filtering and transformation operations; including
> distributed joins from heterogeneous data sources and transformations into a
> time series view that is fed to the machine learning computations that are
> rolled into mappers and reducers.  Remember, this is data from airlines and
> the FAA, it is not pretty.  Web data is messy but we have lots of good
> frameworks, libs and sanitizers for web data.
>
> We wrapped cascading in a thin layer that we use to wrap clojure functions
> in the cascading function objects and inject those into individual steps in
> the workflows.  This gets us very close to normal function composition for
> the client code.  Ultimately, we want to be able to do normal function
> composition to compose cascading workflows in the same way as we would would
> do vanilla function composition for small test runs on our local machines.
> This is an execution agnostic programming model; client code doesn't bear
> the signs of distributed execution.
>
> As a beneficial side effect, we found that this model forces us to have more
> fine grained abstractions - because each operation must be ultimately be
> injectable into a map-reduce phase, otherwise your paralleizm will be
> unnecessarily course grained.  This steers us clear of monolithic
> uber-expressions.
>
> Another aspect of the design that allows us to do this is that the data
> transformations write out clojure data structure literals, so we are
> entirely insulated from the normal hadoop input/output formats...the wrapper
> layer just uses the normal clojure reader to read in the strings from hadoop
> and apply the vanilla clojure functions to the data structures.  But we are
> not limited to only clojure data structure literals.  We also inject other
> readers that can read other strings to clojure data structures, for example.
> we use Dan Larkin's wonderful json lib for the initial reads of the raw json
> data we store.
>
> All the analytical code is custom, so we don't use many 3rd party libs
> outside of cascading, hadoop, the invaluable jets3t for working with s3.
> Oh, and of course, - since we do so much with temporal analysis - joda-time
> is the only way to work with dates in a sane way on the jvm. :-)
>
> If you travel a lot, check us out: flightcaster.com ... we have iphone and
> blackberry apps.  Unfortunately this is domestic US air travel only at the
> moment due to the difficulty of of obtaining data for international carriers
> and aviation agencies.
>

Very cool - congrats!

Rich

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to