Re: Designing API for a Markov Model

RJ Nowling Sun, 14 Sep 2014 19:06:28 -0700

Hi Chris,

I'm more than happy to answer questions.


General background for the project: My friend Jay Vyas initiated
BigPetStore, a big data application blueprint for the Hadoop ecosystem
centered around transaction data for a fictional chain of pet stores.
 BigPetStore is currently part of the Apache BigTop distribution.

I developed a much more advanced data generator that uses ab initio
modeling of customer behavior to embed patterns complex enough for use in
analytics.  I developed the data generator in Python, made it available
under the Apache 2.0 license, and currently have an associated conference
paper under review:

https://github.com/rnowling/bigpetstore-data-generator/tree/branch-0.2

My next step is to rewrite the generator in a JVM language for integration
with Hadoop and Spark and contribute it to BigTop. I'm very comfortable
with Java but I'm also rewriting parts in Clojure and Scala to get a feel
for whether they would make better fits.  If the Clojure / Scala ports
reach nearly complete status, I'll happy release them as well.

In general, I'm curious about the state of math modeling and machine
learning libraries on the JVM.  Incanter is nice but it seems to be missing
Hidden Markov Models, Monte Carlo methods, numerical integrators for
differential equations, and common machine learning methods.

I'm only using Markov models, not HMMs, though.  MMs are simple enough that
I can implement the functionality in less than 100 lines of code.  However,
if you know of a good library, I'm happy to take a look.

Thanks!

On Sun, Sep 14, 2014 at 7:16 PM, Christopher Small <metasoar...@gmail.com>
wrote:

> A few questions out of curiosity, if you don't mind:
>
> * Have you looked at existing MM libraries for Clojure?
> * Is there something you need that other's don't currently
> offer/emphasize; or is this more of a learning project?
> * Are you planning on or interested in open sourcing your work?
>
> Best
>
> Chris
>
>
>
> On Sunday, September 14, 2014 8:18:30 AM UTC-7, RJ Nowling wrote:
>>
>> Thanks for the response!
>>
>> You make a really good point about the first interface -- makes it easy
>> to use with the built in functions.
>>
>> The only things that really define the process is the model (a transition
>> matrix) and the current state.  The model doesn't change but the current
>> state does.  The next state is always chosen randomly based on
>> probabilities that come from knowing the previous state.
>>
>> My main argument for bundling the model and state together is that I
>> don't want a user to pass in the wrong state or a state from a different
>> process.  Is there a Clojure opinion on these sort of things?
>>
>> I was thinking a third approach could be to have:
>>
>> (progress-state process1) -> process2
>>
>> and
>>
>> (get-state process2) -> state
>>
>> so that progressing the state and getting the current state are decoupled.
>>
>> On Sun, Sep 14, 2014 at 6:00 AM, Jony Hudson <jonye...@gmail.com> wrote:
>>
>>> It's nice if the function returns the same sort of data as it consumes,
>>> because then it's easy to repeat it with `iterate` or `reduce`. So, if you
>>> take your first example, then you could write:
>>>
>>> (take 100
>>>   (iterate (partial progress-state markov-model) initial-state))
>>>
>>> to get the next 100 states.
>>>
>>> If the process takes information at each step, e.g.:
>>>
>>> (progress-state-with-input markov-model-2 previous-state current-input)
>>> -> new-state
>>>
>>> then you can do a similar thing with reduce:
>>>
>>> (take 100
>>>   (reductions (partial progress-state-with-input markov-model-2)
>>> initial-state inputs))
>>>
>>> I'd prefer that to your second approach, as I don't think there's much
>>> reason to bundle the process and its state.
>>>
>>> Another question to ponder is whether there should be a progress-state
>>> function, or whether the model itself could be a function. If the mechanics
>>> of the process are somewhat generic, and the `markov-model` is just data,
>>> then it's good as it is. But I'd make sure that progress-state isn't just
>>> an empty wrapper.
>>>
>>>
>>> Jony
>>>
>>> On Sunday, 14 September 2014 03:28:10 UTC+1, RJ Nowling wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I'm new to Clojure and implementing a Markov Model as part of a larger
>>>> project.  I'd like some advice on the API for a progress-state function.
>>>>
>>>> I see two possible options.  In the first option, we always ask the
>>>> user to provide and keep track of the MSM state themselves:
>>>>
>>>> (progress-state markov-model previous-state) -> new-state
>>>>
>>>> In the second approach, we create a record that combines a model and a
>>>> current state:
>>>>
>>>> (defrecord MarkovProcess [model current-state])
>>>>
>>>> (progress-state markov-process) -> updated-markov-process, new-state
>>>>
>>>> Which of these approaches is more idiomatic for Clojure?  Are multiple
>>>> return types an accepted practice in Clojure?  Is there a third, better 
>>>> way?
>>>>
>>>> Thanks in advance!
>>>>
>>>> RJ
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> ---
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "Clojure" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>> topic/clojure/t7th1wY-Vos/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> clojure+u...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> em rnow...@gmail.com
>> c 954.496.2314
>>
>  --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Clojure" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/clojure/t7th1wY-Vos/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
em rnowl...@gmail.com
c 954.496.2314

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Designing API for a Markov Model

Reply via email to