[jira] [Created] (MAHOUT-1937) Model should be able to import/export to PMML

2017-02-03 Thread Trevor Grant (JIRA)
Trevor Grant created MAHOUT-1937:


 Summary: Model should be able to import/export to PMML
 Key: MAHOUT-1937
 URL: https://issues.apache.org/jira/browse/MAHOUT-1937
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.13.1
Reporter: Trevor Grant
Priority: Trivial
 Fix For: 0.14.0


The Predictive Model Markup Language is a generic format for specifying models 
in XML form.

https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MAHOUT-1041) Support for PMML

2015-07-15 Thread Andrew Palumbo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629099#comment-14629099
 ] 

Andrew Palumbo commented on MAHOUT-1041:


That is very cool.  Please do keep us posted.  FYI, as of Mahout v0.10 we also 
have a Spark backed implementation of Naive Bayes in our new engine neutral 
environment.

> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
> Environment: Software Platform
>Reporter: Duraimurugan
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Commented] (MAHOUT-1041) Support for PMML

2015-07-15 Thread Dmitriy Lyubimov
This is cool.

Mahout is trending towards Scala and Scala-based environment which is
independent of distributed backends. see blog []1  for summary. Perhaps we
can do more in that direction? Add things in Scala?

[1]
http://www.weatheringthroughtechdays.com/2015/04/mahout-010x-first-mahout-release-as.html



On Wed, Jul 15, 2015 at 11:25 AM, Chris A. Mattmann (JIRA) 
wrote:

>
> [
> https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628501#comment-14628501
> ]
>
> Chris A. Mattmann commented on MAHOUT-1041:
> ---
>
> BTW, we have integrated Mahout into Nutch in our Naive Bayes ParseFilter
> here:
>
>
> https://github.com/apache/nutch/blob/trunk/src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java
>
> Yay Mahout!
>
> > Support for PMML
> > 
> >
> > Key: MAHOUT-1041
> > URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> > Project: Mahout
> >  Issue Type: Improvement
> >  Components: Integration
> > Environment: Software Platform
> >Reporter: Duraimurugan
> >
> > Would like to request a support for PMML. With that once the predictive
> models are built and provided in PMML format, we should be able to import
> into hadoop cluster for scoring. This way models built in external
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable
> environment.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


[jira] [Commented] (MAHOUT-1041) Support for PMML

2015-07-15 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628501#comment-14628501
 ] 

Chris A. Mattmann commented on MAHOUT-1041:
---

BTW, we have integrated Mahout into Nutch in our Naive Bayes ParseFilter here:

https://github.com/apache/nutch/blob/trunk/src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java

Yay Mahout!

> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
> Environment: Software Platform
>Reporter: Duraimurugan
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAHOUT-1041) Support for PMML

2015-07-15 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628500#comment-14628500
 ] 

Chris A. Mattmann commented on MAHOUT-1041:
---

Hey folks, we have some interest on my team for DARPA memex in doing this. 
We'll take a look at jpmml and report back.

> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
> Environment: Software Platform
>Reporter: Duraimurugan
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: PMML

2015-03-05 Thread Suneel Marthi
Yes, it makes sense having one for Naive Bayes and KMeans (when we have
that !!).

On Thu, Mar 5, 2015 at 11:49 AM, Pat Ferrel  wrote:

> PMML doesn’t make a lot of sense when the model is a potentially massive
> matrix. One reason is that it will be pretty hard (impossible?) to
> parallelize read/write with the engines we use. JSON has the same problem
> and the only way SchemaRDD can read JSON is by bending the rules.
>
> Seems like a good thing to support for algos that can make good use of it.
> Does that narrow it down to naive bayes today?
>
> On Mar 5, 2015, at 2:19 AM, Ted Dunning  wrote:
>
> PMML is a machine-to-machine mechanism, not intended really for human
> consumption or production.  Based on XML, it is, of course, bloated, but
> that doesn't really matter for readability since reading isn't the goal.
>
> The vision of making models easy to transfer from system to system is nice,
> but the reality has fallen far short, unfortunately.  The problem is that
> systems often have special aspects that make it hard to replicate exact
> actions from one system to another.  Having a textual format for numerical
> data doesn't help.
>
> Here, for instance, is a linear regression model that I created using R:
>
> http://www.dmg.org/PMML-4_2"; xmlns:xsi="
> http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="
> http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd";>
> 
>  
>  
>  2015-03-05 09:46:32
> 
> 
>  
>  
>  
>  
> 
>  functionName="regression" algorithmName="least squares">
>  
>   
>   
>   
>   
>  
>  
>   
>  
>  
>   
>coefficient="-1.00362806356329"/>
>coefficient="0.998224481877296"/>
>  
> 
> 
>
> This looks pretty reasonable (if verbose).   It takes 1.5kB to store a
> model but this compresses to around 600 bytes.
>
> More involved models are a different story.  I built a simple random forest
> on the same data and simply conversion to PMML took several minutes.
> Presumably the R package involved is kind of inefficient, but this still is
> pretty daunting.  Manipulating the resulting PMML representation is
> actually quite difficult.
>
> Saving the random forest model ultimately resulted in a 50MB file.
> Compression reduced that to about 6MB.  This is pretty massive for a fairly
> simple model.
>
>
>
>
> On Thu, Mar 5, 2015 at 4:25 AM, Andrew Musselman <
> andrew.mussel...@gmail.com
> > wrote:
>
> > I think keeping it simple is best, try implementing one or two models in
> > XML and then get fancy if it makes sense.
> >
> > On Wednesday, March 4, 2015, Saikat Kanjilal 
> wrote:
> >
> >> Next question: Is the audience for PMML programmers or could it be folks
> >> that can script?  I'm wondering how this intersects with a simple spark
> >> like DSL , could Mahout implement an intersection between the two?  If
> >> there's interest I can go into examples.
> >>
> >> Sent from my iPhone
> >>
> >>> On Mar 4, 2015, at 4:17 PM, Andrew Musselman <
> > andrew.mussel...@gmail.com
> >> > wrote:
> >>>
> >>> Sure, those would be options.
> >>>
> >>>> On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal  >> > wrote:
> >>>>
> >>>> Question, is there a way to introduce PMML with using a more
> > lightweight
> >>>> format like yaml or json?
> >>>>
> >>>>> Date: Wed, 4 Mar 2015 13:25:29 -0800
> >>>>> Subject: Re: PMML
> >>>>> From: andrew.mussel...@gmail.com 
> >>>>> To: dev@mahout.apache.org 
> >>>>>
> >>>>> Yes, the limitations are often an issue for people doing things that
> >>>> aren't
> >>>>> in the PMML spec yet; there could be room for suggesting new features
> >> in
> >>>>> the spec by building them though, I suppose.
> >>>>>
> >>>>> Also agree that XML is a lousy/bloated way of representing stuff like
> >>>> this,
> >>>>> but in the end it's just a choice of representation so there may be
> >>>> reason
> >>>>> to use some other encoding and then provide an XML-export function.
> >>>>>
> >>>>>> On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov <
> > dlie...@gmail.com
> >> >
> >>>>> wrote:
> >>>>>
> >>>>>

Re: PMML

2015-03-05 Thread Pat Ferrel
PMML doesn’t make a lot of sense when the model is a potentially massive 
matrix. One reason is that it will be pretty hard (impossible?) to parallelize 
read/write with the engines we use. JSON has the same problem and the only way 
SchemaRDD can read JSON is by bending the rules.

Seems like a good thing to support for algos that can make good use of it. Does 
that narrow it down to naive bayes today?

On Mar 5, 2015, at 2:19 AM, Ted Dunning  wrote:

PMML is a machine-to-machine mechanism, not intended really for human
consumption or production.  Based on XML, it is, of course, bloated, but
that doesn't really matter for readability since reading isn't the goal.

The vision of making models easy to transfer from system to system is nice,
but the reality has fallen far short, unfortunately.  The problem is that
systems often have special aspects that make it hard to replicate exact
actions from one system to another.  Having a textual format for numerical
data doesn't help.

Here, for instance, is a linear regression model that I created using R:

http://www.dmg.org/PMML-4_2"; xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="
http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd";>

 
 
 2015-03-05 09:46:32


 
 
 
 


 
  
  
  
  
 
 
  
 
 
  
  
  
 



This looks pretty reasonable (if verbose).   It takes 1.5kB to store a
model but this compresses to around 600 bytes.

More involved models are a different story.  I built a simple random forest
on the same data and simply conversion to PMML took several minutes.
Presumably the R package involved is kind of inefficient, but this still is
pretty daunting.  Manipulating the resulting PMML representation is
actually quite difficult.

Saving the random forest model ultimately resulted in a 50MB file.
Compression reduced that to about 6MB.  This is pretty massive for a fairly
simple model.




On Thu, Mar 5, 2015 at 4:25 AM, Andrew Musselman  wrote:

> I think keeping it simple is best, try implementing one or two models in
> XML and then get fancy if it makes sense.
> 
> On Wednesday, March 4, 2015, Saikat Kanjilal  wrote:
> 
>> Next question: Is the audience for PMML programmers or could it be folks
>> that can script?  I'm wondering how this intersects with a simple spark
>> like DSL , could Mahout implement an intersection between the two?  If
>> there's interest I can go into examples.
>> 
>> Sent from my iPhone
>> 
>>> On Mar 4, 2015, at 4:17 PM, Andrew Musselman <
> andrew.mussel...@gmail.com
>> > wrote:
>>> 
>>> Sure, those would be options.
>>> 
>>>> On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal > > wrote:
>>>> 
>>>> Question, is there a way to introduce PMML with using a more
> lightweight
>>>> format like yaml or json?
>>>> 
>>>>> Date: Wed, 4 Mar 2015 13:25:29 -0800
>>>>> Subject: Re: PMML
>>>>> From: andrew.mussel...@gmail.com 
>>>>> To: dev@mahout.apache.org 
>>>>> 
>>>>> Yes, the limitations are often an issue for people doing things that
>>>> aren't
>>>>> in the PMML spec yet; there could be room for suggesting new features
>> in
>>>>> the spec by building them though, I suppose.
>>>>> 
>>>>> Also agree that XML is a lousy/bloated way of representing stuff like
>>>> this,
>>>>> but in the end it's just a choice of representation so there may be
>>>> reason
>>>>> to use some other encoding and then provide an XML-export function.
>>>>> 
>>>>>> On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov <
> dlie...@gmail.com
>> >
>>>>> wrote:
>>>>> 
>>>>>> I am willing to +1 any contribution at this point.
>>>>>> 
>>>>>> my previous company used pmml to serialize simple stuff, but i don't
>>>>>> have first hand experience. Its flexibility is ultimately pretty
>>>>>> limited, isn't it? And xml is ultimately a media which is too ugly
> and
>>>>>> too verbose at the same time to represent models with any more or
> less
>>>>>> decent number of parameters?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi <
>> suneel.mar...@gmail.com 
>>>>> 
>>>>>> wrote:
>>>>>>> It makes sense to support PMML for classification and clustering
>>>> tasks to
>>>>>>> be

Re: PMML

2015-03-05 Thread Ted Dunning
PMML is a machine-to-machine mechanism, not intended really for human
consumption or production.  Based on XML, it is, of course, bloated, but
that doesn't really matter for readability since reading isn't the goal.

The vision of making models easy to transfer from system to system is nice,
but the reality has fallen far short, unfortunately.  The problem is that
systems often have special aspects that make it hard to replicate exact
actions from one system to another.  Having a textual format for numerical
data doesn't help.

Here, for instance, is a linear regression model that I created using R:

http://www.dmg.org/PMML-4_2"; xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="
http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd";>
 
  
  
  2015-03-05 09:46:32
 
 
  
  
  
  
 
 
  
   
   
   
   
  
  
   
  
  
   
   
   
  
 


This looks pretty reasonable (if verbose).   It takes 1.5kB to store a
model but this compresses to around 600 bytes.

More involved models are a different story.  I built a simple random forest
on the same data and simply conversion to PMML took several minutes.
Presumably the R package involved is kind of inefficient, but this still is
pretty daunting.  Manipulating the resulting PMML representation is
actually quite difficult.

Saving the random forest model ultimately resulted in a 50MB file.
Compression reduced that to about 6MB.  This is pretty massive for a fairly
simple model.




On Thu, Mar 5, 2015 at 4:25 AM, Andrew Musselman  wrote:

> I think keeping it simple is best, try implementing one or two models in
> XML and then get fancy if it makes sense.
>
> On Wednesday, March 4, 2015, Saikat Kanjilal  wrote:
>
> > Next question: Is the audience for PMML programmers or could it be folks
> > that can script?  I'm wondering how this intersects with a simple spark
> > like DSL , could Mahout implement an intersection between the two?  If
> > there's interest I can go into examples.
> >
> > Sent from my iPhone
> >
> > > On Mar 4, 2015, at 4:17 PM, Andrew Musselman <
> andrew.mussel...@gmail.com
> > > wrote:
> > >
> > > Sure, those would be options.
> > >
> > >> On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal  > > wrote:
> > >>
> > >> Question, is there a way to introduce PMML with using a more
> lightweight
> > >> format like yaml or json?
> > >>
> > >>> Date: Wed, 4 Mar 2015 13:25:29 -0800
> > >>> Subject: Re: PMML
> > >>> From: andrew.mussel...@gmail.com 
> > >>> To: dev@mahout.apache.org 
> > >>>
> > >>> Yes, the limitations are often an issue for people doing things that
> > >> aren't
> > >>> in the PMML spec yet; there could be room for suggesting new features
> > in
> > >>> the spec by building them though, I suppose.
> > >>>
> > >>> Also agree that XML is a lousy/bloated way of representing stuff like
> > >> this,
> > >>> but in the end it's just a choice of representation so there may be
> > >> reason
> > >>> to use some other encoding and then provide an XML-export function.
> > >>>
> > >>>> On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov <
> dlie...@gmail.com
> > >
> > >>> wrote:
> > >>>
> > >>>> I am willing to +1 any contribution at this point.
> > >>>>
> > >>>> my previous company used pmml to serialize simple stuff, but i don't
> > >>>> have first hand experience. Its flexibility is ultimately pretty
> > >>>> limited, isn't it? And xml is ultimately a media which is too ugly
> and
> > >>>> too verbose at the same time to represent models with any more or
> less
> > >>>> decent number of parameters?
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi <
> > suneel.mar...@gmail.com 
> > >>>
> > >>>> wrote:
> > >>>>> It makes sense to support PMML for classification and clustering
> > >> tasks to
> > >>>>> be able to share and distribute trained models. Sean, Pat, Dmitriy
> > >> and
> > >>>> Ted
> > >>>>> please chime in.
> > >>>>>
> > >>>>> PMML support in Mahout was talked about for a long time now but
> never
> > >>>>> really got any traction to take off.
> > >>>>>
> > >>>>> +1 to build this.
> > >>>>>
> > >>>>> On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman <
> > >>>>> andrew.mussel...@gmail.com > wrote:
> > >>>>>
> > >>>>>> How much interest is there in a mahout-pmml module, with a
> starting
> > >>>> point
> > >>>>>> to be able to export a few analytic/scoring jobs to PMML
> > >> representation?
> > >>>>>>
> > >>>>>> I've seen a lot of interest at in being able to use PMML to
> > >> translate
> > >>>>>> analytic work into production(though I think people talk about it
> > >> more
> > >>>> than
> > >>>>>> they do it), and it could be a benchmark as part of a "definition
> of
> > >>>> done"
> > >>>>>> for any existing/new method we include since there's a spec to
> > >> build to.
> > >>>>>>
> > >>>>>> Best
> > >>>>>> Andrew
> > >>
> > >>
> >
>


Re: PMML

2015-03-04 Thread Andrew Musselman
I think keeping it simple is best, try implementing one or two models in
XML and then get fancy if it makes sense.

On Wednesday, March 4, 2015, Saikat Kanjilal  wrote:

> Next question: Is the audience for PMML programmers or could it be folks
> that can script?  I'm wondering how this intersects with a simple spark
> like DSL , could Mahout implement an intersection between the two?  If
> there's interest I can go into examples.
>
> Sent from my iPhone
>
> > On Mar 4, 2015, at 4:17 PM, Andrew Musselman  > wrote:
> >
> > Sure, those would be options.
> >
> >> On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal  > wrote:
> >>
> >> Question, is there a way to introduce PMML with using a more lightweight
> >> format like yaml or json?
> >>
> >>> Date: Wed, 4 Mar 2015 13:25:29 -0800
> >>> Subject: Re: PMML
> >>> From: andrew.mussel...@gmail.com 
> >>> To: dev@mahout.apache.org 
> >>>
> >>> Yes, the limitations are often an issue for people doing things that
> >> aren't
> >>> in the PMML spec yet; there could be room for suggesting new features
> in
> >>> the spec by building them though, I suppose.
> >>>
> >>> Also agree that XML is a lousy/bloated way of representing stuff like
> >> this,
> >>> but in the end it's just a choice of representation so there may be
> >> reason
> >>> to use some other encoding and then provide an XML-export function.
> >>>
> >>>> On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov  >
> >>> wrote:
> >>>
> >>>> I am willing to +1 any contribution at this point.
> >>>>
> >>>> my previous company used pmml to serialize simple stuff, but i don't
> >>>> have first hand experience. Its flexibility is ultimately pretty
> >>>> limited, isn't it? And xml is ultimately a media which is too ugly and
> >>>> too verbose at the same time to represent models with any more or less
> >>>> decent number of parameters?
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi <
> suneel.mar...@gmail.com 
> >>>
> >>>> wrote:
> >>>>> It makes sense to support PMML for classification and clustering
> >> tasks to
> >>>>> be able to share and distribute trained models. Sean, Pat, Dmitriy
> >> and
> >>>> Ted
> >>>>> please chime in.
> >>>>>
> >>>>> PMML support in Mahout was talked about for a long time now but never
> >>>>> really got any traction to take off.
> >>>>>
> >>>>> +1 to build this.
> >>>>>
> >>>>> On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman <
> >>>>> andrew.mussel...@gmail.com > wrote:
> >>>>>
> >>>>>> How much interest is there in a mahout-pmml module, with a starting
> >>>> point
> >>>>>> to be able to export a few analytic/scoring jobs to PMML
> >> representation?
> >>>>>>
> >>>>>> I've seen a lot of interest at in being able to use PMML to
> >> translate
> >>>>>> analytic work into production(though I think people talk about it
> >> more
> >>>> than
> >>>>>> they do it), and it could be a benchmark as part of a "definition of
> >>>> done"
> >>>>>> for any existing/new method we include since there's a spec to
> >> build to.
> >>>>>>
> >>>>>> Best
> >>>>>> Andrew
> >>
> >>
>


Re: PMML

2015-03-04 Thread Saikat Kanjilal
Next question: Is the audience for PMML programmers or could it be folks that 
can script?  I'm wondering how this intersects with a simple spark like DSL , 
could Mahout implement an intersection between the two?  If there's interest I 
can go into examples.

Sent from my iPhone

> On Mar 4, 2015, at 4:17 PM, Andrew Musselman  
> wrote:
> 
> Sure, those would be options.
> 
>> On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal  wrote:
>> 
>> Question, is there a way to introduce PMML with using a more lightweight
>> format like yaml or json?
>> 
>>> Date: Wed, 4 Mar 2015 13:25:29 -0800
>>> Subject: Re: PMML
>>> From: andrew.mussel...@gmail.com
>>> To: dev@mahout.apache.org
>>> 
>>> Yes, the limitations are often an issue for people doing things that
>> aren't
>>> in the PMML spec yet; there could be room for suggesting new features in
>>> the spec by building them though, I suppose.
>>> 
>>> Also agree that XML is a lousy/bloated way of representing stuff like
>> this,
>>> but in the end it's just a choice of representation so there may be
>> reason
>>> to use some other encoding and then provide an XML-export function.
>>> 
>>>> On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov 
>>> wrote:
>>> 
>>>> I am willing to +1 any contribution at this point.
>>>> 
>>>> my previous company used pmml to serialize simple stuff, but i don't
>>>> have first hand experience. Its flexibility is ultimately pretty
>>>> limited, isn't it? And xml is ultimately a media which is too ugly and
>>>> too verbose at the same time to represent models with any more or less
>>>> decent number of parameters?
>>>> 
>>>> 
>>>> 
>>>> On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi >> 
>>>> wrote:
>>>>> It makes sense to support PMML for classification and clustering
>> tasks to
>>>>> be able to share and distribute trained models. Sean, Pat, Dmitriy
>> and
>>>> Ted
>>>>> please chime in.
>>>>> 
>>>>> PMML support in Mahout was talked about for a long time now but never
>>>>> really got any traction to take off.
>>>>> 
>>>>> +1 to build this.
>>>>> 
>>>>> On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman <
>>>>> andrew.mussel...@gmail.com> wrote:
>>>>> 
>>>>>> How much interest is there in a mahout-pmml module, with a starting
>>>> point
>>>>>> to be able to export a few analytic/scoring jobs to PMML
>> representation?
>>>>>> 
>>>>>> I've seen a lot of interest at in being able to use PMML to
>> translate
>>>>>> analytic work into production(though I think people talk about it
>> more
>>>> than
>>>>>> they do it), and it could be a benchmark as part of a "definition of
>>>> done"
>>>>>> for any existing/new method we include since there's a spec to
>> build to.
>>>>>> 
>>>>>> Best
>>>>>> Andrew
>> 
>> 


Re: PMML

2015-03-04 Thread Andrew Musselman
Sure, those would be options.

On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal  wrote:

> Question, is there a way to introduce PMML with using a more lightweight
> format like yaml or json?
>
> > Date: Wed, 4 Mar 2015 13:25:29 -0800
> > Subject: Re: PMML
> > From: andrew.mussel...@gmail.com
> > To: dev@mahout.apache.org
> >
> > Yes, the limitations are often an issue for people doing things that
> aren't
> > in the PMML spec yet; there could be room for suggesting new features in
> > the spec by building them though, I suppose.
> >
> > Also agree that XML is a lousy/bloated way of representing stuff like
> this,
> > but in the end it's just a choice of representation so there may be
> reason
> > to use some other encoding and then provide an XML-export function.
> >
> > On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov 
> wrote:
> >
> > > I am willing to +1 any contribution at this point.
> > >
> > > my previous company used pmml to serialize simple stuff, but i don't
> > > have first hand experience. Its flexibility is ultimately pretty
> > > limited, isn't it? And xml is ultimately a media which is too ugly and
> > > too verbose at the same time to represent models with any more or less
> > > decent number of parameters?
> > >
> > >
> > >
> > > On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi  >
> > > wrote:
> > > > It makes sense to support PMML for classification and clustering
> tasks to
> > > > be able to share and distribute trained models. Sean, Pat, Dmitriy
> and
> > > Ted
> > > > please chime in.
> > > >
> > > > PMML support in Mahout was talked about for a long time now but never
> > > > really got any traction to take off.
> > > >
> > > > +1 to build this.
> > > >
> > > > On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman <
> > > > andrew.mussel...@gmail.com> wrote:
> > > >
> > > >> How much interest is there in a mahout-pmml module, with a starting
> > > point
> > > >> to be able to export a few analytic/scoring jobs to PMML
> representation?
> > > >>
> > > >> I've seen a lot of interest at in being able to use PMML to
> translate
> > > >> analytic work into production(though I think people talk about it
> more
> > > than
> > > >> they do it), and it could be a benchmark as part of a "definition of
> > > done"
> > > >> for any existing/new method we include since there's a spec to
> build to.
> > > >>
> > > >> Best
> > > >> Andrew
> > > >>
> > >
>
>


RE: PMML

2015-03-04 Thread Saikat Kanjilal
Question, is there a way to introduce PMML with using a more lightweight format 
like yaml or json?

> Date: Wed, 4 Mar 2015 13:25:29 -0800
> Subject: Re: PMML
> From: andrew.mussel...@gmail.com
> To: dev@mahout.apache.org
> 
> Yes, the limitations are often an issue for people doing things that aren't
> in the PMML spec yet; there could be room for suggesting new features in
> the spec by building them though, I suppose.
> 
> Also agree that XML is a lousy/bloated way of representing stuff like this,
> but in the end it's just a choice of representation so there may be reason
> to use some other encoding and then provide an XML-export function.
> 
> On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov  wrote:
> 
> > I am willing to +1 any contribution at this point.
> >
> > my previous company used pmml to serialize simple stuff, but i don't
> > have first hand experience. Its flexibility is ultimately pretty
> > limited, isn't it? And xml is ultimately a media which is too ugly and
> > too verbose at the same time to represent models with any more or less
> > decent number of parameters?
> >
> >
> >
> > On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi 
> > wrote:
> > > It makes sense to support PMML for classification and clustering tasks to
> > > be able to share and distribute trained models. Sean, Pat, Dmitriy and
> > Ted
> > > please chime in.
> > >
> > > PMML support in Mahout was talked about for a long time now but never
> > > really got any traction to take off.
> > >
> > > +1 to build this.
> > >
> > > On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman <
> > > andrew.mussel...@gmail.com> wrote:
> > >
> > >> How much interest is there in a mahout-pmml module, with a starting
> > point
> > >> to be able to export a few analytic/scoring jobs to PMML representation?
> > >>
> > >> I've seen a lot of interest at in being able to use PMML to translate
> > >> analytic work into production(though I think people talk about it more
> > than
> > >> they do it), and it could be a benchmark as part of a "definition of
> > done"
> > >> for any existing/new method we include since there's a spec to build to.
> > >>
> > >> Best
> > >> Andrew
> > >>
> >
  

Re: PMML

2015-03-04 Thread Andrew Musselman
Yes, the limitations are often an issue for people doing things that aren't
in the PMML spec yet; there could be room for suggesting new features in
the spec by building them though, I suppose.

Also agree that XML is a lousy/bloated way of representing stuff like this,
but in the end it's just a choice of representation so there may be reason
to use some other encoding and then provide an XML-export function.

On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov  wrote:

> I am willing to +1 any contribution at this point.
>
> my previous company used pmml to serialize simple stuff, but i don't
> have first hand experience. Its flexibility is ultimately pretty
> limited, isn't it? And xml is ultimately a media which is too ugly and
> too verbose at the same time to represent models with any more or less
> decent number of parameters?
>
>
>
> On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi 
> wrote:
> > It makes sense to support PMML for classification and clustering tasks to
> > be able to share and distribute trained models. Sean, Pat, Dmitriy and
> Ted
> > please chime in.
> >
> > PMML support in Mahout was talked about for a long time now but never
> > really got any traction to take off.
> >
> > +1 to build this.
> >
> > On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman <
> > andrew.mussel...@gmail.com> wrote:
> >
> >> How much interest is there in a mahout-pmml module, with a starting
> point
> >> to be able to export a few analytic/scoring jobs to PMML representation?
> >>
> >> I've seen a lot of interest at in being able to use PMML to translate
> >> analytic work into production(though I think people talk about it more
> than
> >> they do it), and it could be a benchmark as part of a "definition of
> done"
> >> for any existing/new method we include since there's a spec to build to.
> >>
> >> Best
> >> Andrew
> >>
>


Re: PMML

2015-03-04 Thread Dmitriy Lyubimov
I am willing to +1 any contribution at this point.

my previous company used pmml to serialize simple stuff, but i don't
have first hand experience. Its flexibility is ultimately pretty
limited, isn't it? And xml is ultimately a media which is too ugly and
too verbose at the same time to represent models with any more or less
decent number of parameters?



On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi  wrote:
> It makes sense to support PMML for classification and clustering tasks to
> be able to share and distribute trained models. Sean, Pat, Dmitriy and Ted
> please chime in.
>
> PMML support in Mahout was talked about for a long time now but never
> really got any traction to take off.
>
> +1 to build this.
>
> On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
>> How much interest is there in a mahout-pmml module, with a starting point
>> to be able to export a few analytic/scoring jobs to PMML representation?
>>
>> I've seen a lot of interest at in being able to use PMML to translate
>> analytic work into production(though I think people talk about it more than
>> they do it), and it could be a benchmark as part of a "definition of done"
>> for any existing/new method we include since there's a spec to build to.
>>
>> Best
>> Andrew
>>


Re: PMML

2015-03-03 Thread Suneel Marthi
It makes sense to support PMML for classification and clustering tasks to
be able to share and distribute trained models. Sean, Pat, Dmitriy and Ted
please chime in.

PMML support in Mahout was talked about for a long time now but never
really got any traction to take off.

+1 to build this.

On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> How much interest is there in a mahout-pmml module, with a starting point
> to be able to export a few analytic/scoring jobs to PMML representation?
>
> I've seen a lot of interest at in being able to use PMML to translate
> analytic work into production(though I think people talk about it more than
> they do it), and it could be a benchmark as part of a "definition of done"
> for any existing/new method we include since there's a spec to build to.
>
> Best
> Andrew
>


PMML

2015-03-03 Thread Andrew Musselman
How much interest is there in a mahout-pmml module, with a starting point
to be able to export a few analytic/scoring jobs to PMML representation?

I've seen a lot of interest at in being able to use PMML to translate
analytic work into production(though I think people talk about it more than
they do it), and it could be a benchmark as part of a "definition of done"
for any existing/new method we include since there's a spec to build to.

Best
Andrew


[jira] [Comment Edited] (MAHOUT-1041) Support for PMML

2013-10-05 Thread Thomas Darimont (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787197#comment-13787197
 ] 

Thomas Darimont edited comment on MAHOUT-1041 at 10/5/13 12:56 PM:
---

As this wasn't mentioned yet: one could use something like cascading:pattern 
(http://www.cascading.org/pattern/) or jpmml-cascading 
(https://github.com/jpmml/jpmml-cascading)  to execute PMML models in Hadoop.


was (Author: thomasd):
In the meantime one could use something like cascading:pattern 
(http://www.cascading.org/pattern/) or jpmml-cascading 
(https://github.com/jpmml/jpmml-cascading)  to execute PMML models in Hadoop.

> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
> Environment: Software Platform
>Reporter: Duraimurugan
> Fix For: Backlog
>
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (MAHOUT-1041) Support for PMML

2013-10-05 Thread Thomas Darimont (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787197#comment-13787197
 ] 

Thomas Darimont edited comment on MAHOUT-1041 at 10/5/13 12:55 PM:
---

In the meantime one could use something like cascading:pattern 
(http://www.cascading.org/pattern/) or jpmml-cascading 
(https://github.com/jpmml/jpmml-cascading)  to execute PMML models in Hadoop.


was (Author: thomasd):
In the meantime one could use something like cascading:pattern 
(http://www.cascading.org/pattern/) or jpmml-cascading 
(https://github.com/jpmml/jpmml-cascading)  to execute PMML Models in Hadoop.

> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
> Environment: Software Platform
>Reporter: Duraimurugan
> Fix For: Backlog
>
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1041) Support for PMML

2013-10-05 Thread Thomas Darimont (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787197#comment-13787197
 ] 

Thomas Darimont commented on MAHOUT-1041:
-

In the meantime one could use something like cascading:pattern 
(http://www.cascading.org/pattern/) or jpmml-cascading 
(https://github.com/jpmml/jpmml-cascading)  to execute PMML Models in Hadoop.

> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
> Environment: Software Platform
>Reporter: Duraimurugan
> Fix For: Backlog
>
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Mahout and PMML

2013-09-03 Thread Manuel Blechschmidt
Hi Pranay,
as Ted already said there were already multiple times the request for PMML 
support.

I would recommend that you read all the JIRA issues about PMML they contain a 
lot of information what have happened so far:

https://issues.apache.org/jira/browse/MAHOUT-1041 Support for PMML
https://issues.apache.org/jira/browse/MAHOUT-18 Embrace interoperability with 
other softwares

If you want to implement it go ahead and do so.

The following might speed up the integration of your contribution:
https://cwiki.apache.org/MAHOUT/how-to-contribute.html

Have a great week
Manuel

Am 02.09.2013 um 17:27 schrieb Ted Dunning:

> The ability to export PMML for streaming k-means, Naive Bayes and the
> logistic regression classifiers would be useful.
> 
> Nobody has worked on this much yet, but demand, on the other hand, is
> pretty sporadic.
> 
> 
> On Mon, Sep 2, 2013 at 6:06 AM, Pranay Tonpay wrote:
> 
>> Hi,
>> It would really help if i can get some information on this to be able to
>> plan accordingly...
>> 
>> thx
>> pranay
>> 
>>  --
>> *From:* Pranay Tonpay 
>> *To:* "dev@mahout.apache.org" 
>> *Sent:* Monday, August 26, 2013 12:06 PM
>> *Subject:* Mahout and PMML
>> 
>> Hi,
>> 
>> I work as Sr Solutions Architect for Impetus technologies and had been
>> using Mahout quite extensively for my work...
>> Off-late, i am focusing on PMML related stuff and realized that Mahout, at
>> present doesn't seem to have support for that.
>> I was keen to know if its there in the road-map and if not, i would like
>> to contribute to it ... I am sure, it would be a good "add on" to have for
>> Mahout.
>> Even in case there is some work going on for PMML, i would like to be a
>> part of it and contribute ( if you think, that's feasible)
>> 
>> Pls let me know if any of my assumptions are incorrect.
>> Hope to hear from you soon.
>> 
>> thx
>> pranay
>> 
>> 
>> 

-- 
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B



Re: Mahout and PMML

2013-09-02 Thread Ted Dunning
The ability to export PMML for streaming k-means, Naive Bayes and the
logistic regression classifiers would be useful.

Nobody has worked on this much yet, but demand, on the other hand, is
pretty sporadic.


On Mon, Sep 2, 2013 at 6:06 AM, Pranay Tonpay wrote:

> Hi,
> It would really help if i can get some information on this to be able to
> plan accordingly...
>
> thx
> pranay
>
>   --
>  *From:* Pranay Tonpay 
> *To:* "dev@mahout.apache.org" 
> *Sent:* Monday, August 26, 2013 12:06 PM
> *Subject:* Mahout and PMML
>
> Hi,
>
> I work as Sr Solutions Architect for Impetus technologies and had been
> using Mahout quite extensively for my work...
> Off-late, i am focusing on PMML related stuff and realized that Mahout, at
> present doesn't seem to have support for that.
> I was keen to know if its there in the road-map and if not, i would like
> to contribute to it ... I am sure, it would be a good "add on" to have for
> Mahout.
> Even in case there is some work going on for PMML, i would like to be a
> part of it and contribute ( if you think, that's feasible)
>
> Pls let me know if any of my assumptions are incorrect.
> Hope to hear from you soon.
>
> thx
> pranay
>
>
>


[jira] [Resolved] (MAHOUT-1041) Support for PMML

2013-06-01 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved MAHOUT-1041.
-

Resolution: Won't Fix

Without a patch, I don't see putting this in.  Also, I don't see the benefit of 
storing largish models in XML.  I could see a specific issue that can do I/O of 
PMML into Mahout's, but I don't see any thing running natively off of PMML.
    
> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
> Environment: Software Platform
>Reporter: Duraimurugan
> Fix For: Backlog
>
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-1041) Support for PMML

2013-03-11 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated MAHOUT-1041:
---

Affects Version/s: (was: Backlog)

> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
> Environment: Software Platform
>Reporter: Duraimurugan
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-1041) Support for PMML

2013-03-11 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated MAHOUT-1041:
---

Fix Version/s: Backlog

> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
> Environment: Software Platform
>Reporter: Duraimurugan
> Fix For: Backlog
>
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-1041) Support for PMML

2013-03-11 Thread Sebastian Schelter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated MAHOUT-1041:
---

Affects Version/s: (was: 1.0)
   Backlog

> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
>Affects Versions: Backlog
> Environment: Software Platform
>Reporter: Duraimurugan
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: mahout-pmml

2012-12-27 Thread Marty Kube

Hi Ted,
That makes some sense.  I'll probably take a crack at it.
Marty


On 12/27/2012 12:14 AM, Ted Dunning wrote:

Marty,

That sounds like a reasonable idea.  IF integrated, this would need to be a
separate module in any case so for now, it might be easiest for you to
simply develop this module independently so that you don't have to wait for
others to commit partial results.



On Wed, Dec 26, 2012 at 6:52 PM, Marty Kube <
martyk...@beavercreekconsulting.com> wrote:


I took a look at JPMML...  At the bottom of it they have ran a JAXB
compiler on the PMML V4 schema to generate Java bindings.  I didn't see a
lot of value add in JPMML beyond that.

I'd say just add the schema and bindings generation to Mahout.  The value
add here is model mapping from the JAXB generated model into the Mahout
models.

On 12/20/2012 06:13 AM, Grant Ingersoll wrote:


  From looking at PMML 
(http://www.dmg.org/v4-1/**GeneralStructure.html<http://www.dmg.org/v4-1/GeneralStructure.html>),
it seems that JPMML is not going to really get us there if it only supports
the 4 models listed below.  I would think we could go through the
structures supported in the link above and then map it to the Algorithms
that are supported.  To start, perhaps it would make sense to focus on a
few like: clustering, naive bayes and perhaps SGD will fit into the
regression models.  Perhaps try to get K-Means and Naive Bayes to work
first.

FTR, I can only imagine how bloated these files are going to get since
they use XML.  Thankfully, they won't be used to power the internals, just
to support interoperability.

-Grant

On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote:

  Hi All,

as Grant suggested, I forward the email about mahout-pmml.
I already tried jpmml standalone and works fine for me, the next
important point is to understand or maybe create some example for each
model described before:
NeuralNetwork
RandomForest (implemented via Segmentation, which is a PMML version 4.0
feature)
RegressionModel
TreeModel
with only Mahout and next step create a convertor to create object from
jpmml to Mahout. This is related only to import the object and for me the
export object is more similar to these.

Do you agree? Are you interested in this models? Or Mahout focus on
another one?

regards,
Simon

-- Forwarded message --
From: Simon Vocella 
Date: Mon, Dec 17, 2012 at 1:50 AM
Subject: mahout-pmml
To: Grant Ingersoll 
Cc: Marty Kube 


Hi Grant,

I start with this is the project 
https://github.com/voxsim/**mahout-pmml<https://github.com/voxsim/mahout-pmml>(I
 pushed only the skeleton for now) with mahout and jpmml integration (
http://code.google.com/p/**jpmml/ <http://code.google.com/p/jpmml/>)

I read the wiki about weka convertor https://cwiki.apache.org/**
MAHOUT/creating-vectors-from-**wekas-arff-format.html<https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html>
And I read the integration with Lucene http://searchhub.org/2010/03/**
16/integrating-apache-mahout-**with-apache-lucene-and-solr-**
part-i-of-3/<http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/>

In theory we need to do more similar to these parts, but different, we
don't transfrom vector but model, Do i understand correctly?

I'll request directly to you because you have in mind this idea and for
now jpmml support this models
NeuralNetwork
RandomForest (implemented via Segmentation, which is a PMML version 4.0
feature)
RegressionModel
TreeModel
Are you interested in this models? Or Mahout focus on another one?

Simon

PS Marty before to start I need some answers sorry XD

  --**--

Grant Ingersoll
http://www.lucidworks.com










Re: mahout-pmml

2012-12-26 Thread Ted Dunning
Marty,

That sounds like a reasonable idea.  IF integrated, this would need to be a
separate module in any case so for now, it might be easiest for you to
simply develop this module independently so that you don't have to wait for
others to commit partial results.



On Wed, Dec 26, 2012 at 6:52 PM, Marty Kube <
martyk...@beavercreekconsulting.com> wrote:

> I took a look at JPMML...  At the bottom of it they have ran a JAXB
> compiler on the PMML V4 schema to generate Java bindings.  I didn't see a
> lot of value add in JPMML beyond that.
>
> I'd say just add the schema and bindings generation to Mahout.  The value
> add here is model mapping from the JAXB generated model into the Mahout
> models.
>
> On 12/20/2012 06:13 AM, Grant Ingersoll wrote:
>
>>  From looking at PMML 
>> (http://www.dmg.org/v4-1/**GeneralStructure.html<http://www.dmg.org/v4-1/GeneralStructure.html>),
>> it seems that JPMML is not going to really get us there if it only supports
>> the 4 models listed below.  I would think we could go through the
>> structures supported in the link above and then map it to the Algorithms
>> that are supported.  To start, perhaps it would make sense to focus on a
>> few like: clustering, naive bayes and perhaps SGD will fit into the
>> regression models.  Perhaps try to get K-Means and Naive Bayes to work
>> first.
>>
>> FTR, I can only imagine how bloated these files are going to get since
>> they use XML.  Thankfully, they won't be used to power the internals, just
>> to support interoperability.
>>
>> -Grant
>>
>> On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote:
>>
>>  Hi All,
>>>
>>> as Grant suggested, I forward the email about mahout-pmml.
>>> I already tried jpmml standalone and works fine for me, the next
>>> important point is to understand or maybe create some example for each
>>> model described before:
>>> NeuralNetwork
>>> RandomForest (implemented via Segmentation, which is a PMML version 4.0
>>> feature)
>>> RegressionModel
>>> TreeModel
>>> with only Mahout and next step create a convertor to create object from
>>> jpmml to Mahout. This is related only to import the object and for me the
>>> export object is more similar to these.
>>>
>>> Do you agree? Are you interested in this models? Or Mahout focus on
>>> another one?
>>>
>>> regards,
>>> Simon
>>>
>>> -- Forwarded message --
>>> From: Simon Vocella 
>>> Date: Mon, Dec 17, 2012 at 1:50 AM
>>> Subject: mahout-pmml
>>> To: Grant Ingersoll 
>>> Cc: Marty Kube 
>>> 
>>> >
>>>
>>>
>>> Hi Grant,
>>>
>>> I start with this is the project 
>>> https://github.com/voxsim/**mahout-pmml<https://github.com/voxsim/mahout-pmml>(I
>>>  pushed only the skeleton for now) with mahout and jpmml integration (
>>> http://code.google.com/p/**jpmml/ <http://code.google.com/p/jpmml/>)
>>>
>>> I read the wiki about weka convertor https://cwiki.apache.org/**
>>> MAHOUT/creating-vectors-from-**wekas-arff-format.html<https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html>
>>> And I read the integration with Lucene http://searchhub.org/2010/03/**
>>> 16/integrating-apache-mahout-**with-apache-lucene-and-solr-**
>>> part-i-of-3/<http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/>
>>>
>>> In theory we need to do more similar to these parts, but different, we
>>> don't transfrom vector but model, Do i understand correctly?
>>>
>>> I'll request directly to you because you have in mind this idea and for
>>> now jpmml support this models
>>> NeuralNetwork
>>> RandomForest (implemented via Segmentation, which is a PMML version 4.0
>>> feature)
>>> RegressionModel
>>> TreeModel
>>> Are you interested in this models? Or Mahout focus on another one?
>>>
>>> Simon
>>>
>>> PS Marty before to start I need some answers sorry XD
>>>
>>>  --**--
>> Grant Ingersoll
>> http://www.lucidworks.com
>>
>>
>>
>>
>>
>>
>


Re: mahout-pmml

2012-12-26 Thread Marty Kube
I took a look at JPMML...  At the bottom of it they have ran a JAXB 
compiler on the PMML V4 schema to generate Java bindings.  I didn't see 
a lot of value add in JPMML beyond that.


I'd say just add the schema and bindings generation to Mahout.  The 
value add here is model mapping from the JAXB generated model into the 
Mahout models.


On 12/20/2012 06:13 AM, Grant Ingersoll wrote:

 From looking at PMML (http://www.dmg.org/v4-1/GeneralStructure.html), it seems 
that JPMML is not going to really get us there if it only supports the 4 models 
listed below.  I would think we could go through the structures supported in 
the link above and then map it to the Algorithms that are supported.  To start, 
perhaps it would make sense to focus on a few like: clustering, naive bayes and 
perhaps SGD will fit into the regression models.  Perhaps try to get K-Means 
and Naive Bayes to work first.

FTR, I can only imagine how bloated these files are going to get since they use 
XML.  Thankfully, they won't be used to power the internals, just to support 
interoperability.

-Grant

On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote:


Hi All,

as Grant suggested, I forward the email about mahout-pmml.
I already tried jpmml standalone and works fine for me, the next important 
point is to understand or maybe create some example for each model described 
before:
NeuralNetwork
RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature)
RegressionModel
TreeModel
with only Mahout and next step create a convertor to create object from jpmml 
to Mahout. This is related only to import the object and for me the export 
object is more similar to these.

Do you agree? Are you interested in this models? Or Mahout focus on another one?

regards,
Simon

-- Forwarded message --
From: Simon Vocella 
Date: Mon, Dec 17, 2012 at 1:50 AM
Subject: mahout-pmml
To: Grant Ingersoll 
Cc: Marty Kube 


Hi Grant,

I start with this is the project https://github.com/voxsim/mahout-pmml (I 
pushed only the skeleton for now) with mahout and jpmml integration 
(http://code.google.com/p/jpmml/)

I read the wiki about weka convertor 
https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html
And I read the integration with Lucene 
http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/

In theory we need to do more similar to these parts, but different, we don't 
transfrom vector but model, Do i understand correctly?

I'll request directly to you because you have in mind this idea and for now 
jpmml support this models
NeuralNetwork
RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature)
RegressionModel
TreeModel
Are you interested in this models? Or Mahout focus on another one?

Simon

PS Marty before to start I need some answers sorry XD



Grant Ingersoll
http://www.lucidworks.com









Re: mahout-pmml

2012-12-20 Thread Grant Ingersoll
From looking at PMML (http://www.dmg.org/v4-1/GeneralStructure.html), it seems 
that JPMML is not going to really get us there if it only supports the 4 models 
listed below.  I would think we could go through the structures supported in 
the link above and then map it to the Algorithms that are supported.  To start, 
perhaps it would make sense to focus on a few like: clustering, naive bayes and 
perhaps SGD will fit into the regression models.  Perhaps try to get K-Means 
and Naive Bayes to work first.

FTR, I can only imagine how bloated these files are going to get since they use 
XML.  Thankfully, they won't be used to power the internals, just to support 
interoperability.

-Grant

On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote:

> Hi All,
> 
> as Grant suggested, I forward the email about mahout-pmml. 
> I already tried jpmml standalone and works fine for me, the next important 
> point is to understand or maybe create some example for each model described 
> before:
> NeuralNetwork
> RandomForest (implemented via Segmentation, which is a PMML version 4.0 
> feature)
> RegressionModel
> TreeModel
> with only Mahout and next step create a convertor to create object from jpmml 
> to Mahout. This is related only to import the object and for me the export 
> object is more similar to these.
> 
> Do you agree? Are you interested in this models? Or Mahout focus on another 
> one?
> 
> regards,
> Simon
> 
> -- Forwarded message --
> From: Simon Vocella 
> Date: Mon, Dec 17, 2012 at 1:50 AM
> Subject: mahout-pmml
> To: Grant Ingersoll 
> Cc: Marty Kube 
> 
> 
> Hi Grant,
> 
> I start with this is the project https://github.com/voxsim/mahout-pmml (I 
> pushed only the skeleton for now) with mahout and jpmml integration 
> (http://code.google.com/p/jpmml/) 
> 
> I read the wiki about weka convertor 
> https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html
> And I read the integration with Lucene 
> http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/
> 
> In theory we need to do more similar to these parts, but different, we don't 
> transfrom vector but model, Do i understand correctly?
> 
> I'll request directly to you because you have in mind this idea and for now 
> jpmml support this models
> NeuralNetwork
> RandomForest (implemented via Segmentation, which is a PMML version 4.0 
> feature)
> RegressionModel
> TreeModel
> Are you interested in this models? Or Mahout focus on another one?
> 
> Simon
> 
> PS Marty before to start I need some answers sorry XD
> 


Grant Ingersoll
http://www.lucidworks.com






Fwd: mahout-pmml

2012-12-19 Thread Simon Vocella
Hi All,

as Grant suggested, I forward the email about mahout-pmml.
I already tried jpmml standalone and works fine for me, the next important
point is to understand or maybe create some example for each model
described before:

   - NeuralNetwork
   - RandomForest (implemented via Segmentation, which is a PMML version
   4.0 feature)
   - RegressionModel
   - TreeModel

with only Mahout and next step create a convertor to create object from
jpmml to Mahout. This is related only to import the object and for me the
export object is more similar to these.

Do you agree? Are you interested in this models? Or Mahout focus on another
one?

regards,
Simon

-- Forwarded message --
From: Simon Vocella 
Date: Mon, Dec 17, 2012 at 1:50 AM
Subject: mahout-pmml
To: Grant Ingersoll 
Cc: Marty Kube 


Hi Grant,

I start with this is the project https://github.com/voxsim/mahout-pmml (I
pushed only the skeleton for now) with mahout and jpmml integration (
http://code.google.com/p/jpmml/)

I read the wiki about weka convertor
https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html
And I read the integration with Lucene
http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/

In theory we need to do more similar to these parts, but different, we
don't transfrom vector but model, Do i understand correctly?

I'll request directly to you because you have in mind this idea and for now
jpmml support this models

   - NeuralNetwork
   - RandomForest (implemented via Segmentation, which is a PMML version
   4.0 feature)
   - RegressionModel
   - TreeModel

Are you interested in this models? Or Mahout focus on another one?

Simon
PS Marty before to start I need some answers sorry XD


[jira] [Commented] (MAHOUT-1041) Support for PMML

2012-07-06 Thread Manuel Blechschmidt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407980#comment-13407980
 ] 

Manuel Blechschmidt commented on MAHOUT-1041:
-

This has already been proposed about 4 years ago in MAHOUT-18. Currently there 
is nobody who has the time and the knowledge who can implement such an 
exporter. It would be great if you can provide a patch and I would expect that 
it has good chances to get integrated into Mahout as long as it follows the 
following rules:
 * https://cwiki.apache.org/MAHOUT/how-to-contribute.html




> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
>Affects Versions: 1.0
> Environment: Software Platform
>Reporter: Duraimurugan
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-1041) Support for PMML

2012-07-06 Thread Duraimurugan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407972#comment-13407972
 ] 

Duraimurugan commented on MAHOUT-1041:
--

Sure, I can contribute on PMML parser and model code. 

> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
>Affects Versions: 1.0
> Environment: Software Platform
>Reporter: Duraimurugan
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAHOUT-1041) Support for PMML

2012-07-05 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407744#comment-13407744
 ] 

Ted Dunning commented on MAHOUT-1041:
-

This has been proposed before and has withered when there wasn't much support 
in terms of code contributions.

What sort of contributions can you provide?  Problem specification?  PMML 
parser?  Model code?


> Support for PMML
> 
>
> Key: MAHOUT-1041
> URL: https://issues.apache.org/jira/browse/MAHOUT-1041
> Project: Mahout
>  Issue Type: Improvement
>  Components: Integration
>Affects Versions: 1.0
> Environment: Software Platform
>Reporter: Duraimurugan
>
> Would like to request a support for PMML. With that once the predictive 
> models are built and provided in PMML format, we should be able to import 
> into hadoop cluster for scoring. This way models built in external 
> (non-mahout) systems can be imported to Hadoop/Mahout for scalable 
> environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAHOUT-1041) Support for PMML

2012-07-05 Thread Duraimurugan (JIRA)
Duraimurugan created MAHOUT-1041:


 Summary: Support for PMML
 Key: MAHOUT-1041
 URL: https://issues.apache.org/jira/browse/MAHOUT-1041
 Project: Mahout
  Issue Type: Improvement
  Components: Integration
Affects Versions: 1.0
 Environment: Software Platform
Reporter: Duraimurugan


Would like to request a support for PMML. With that once the predictive models 
are built and provided in PMML format, we should be able to import into hadoop 
cluster for scoring. This way models built in external (non-mahout) systems can 
be imported to Hadoop/Mahout for scalable environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira