[jira] [Created] (MAHOUT-1937) Model should be able to import/export to PMML
Trevor Grant created MAHOUT-1937: Summary: Model should be able to import/export to PMML Key: MAHOUT-1937 URL: https://issues.apache.org/jira/browse/MAHOUT-1937 Project: Mahout Issue Type: Improvement Affects Versions: 0.13.1 Reporter: Trevor Grant Priority: Trivial Fix For: 0.14.0 The Predictive Model Markup Language is a generic format for specifying models in XML form. https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629099#comment-14629099 ] Andrew Palumbo commented on MAHOUT-1041: That is very cool. Please do keep us posted. FYI, as of Mahout v0.10 we also have a Spark backed implementation of Naive Bayes in our new engine neutral environment. > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration > Environment: Software Platform >Reporter: Duraimurugan > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [jira] [Commented] (MAHOUT-1041) Support for PMML
This is cool. Mahout is trending towards Scala and Scala-based environment which is independent of distributed backends. see blog []1 for summary. Perhaps we can do more in that direction? Add things in Scala? [1] http://www.weatheringthroughtechdays.com/2015/04/mahout-010x-first-mahout-release-as.html On Wed, Jul 15, 2015 at 11:25 AM, Chris A. Mattmann (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628501#comment-14628501 > ] > > Chris A. Mattmann commented on MAHOUT-1041: > --- > > BTW, we have integrated Mahout into Nutch in our Naive Bayes ParseFilter > here: > > > https://github.com/apache/nutch/blob/trunk/src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java > > Yay Mahout! > > > Support for PMML > > > > > > Key: MAHOUT-1041 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > > Project: Mahout > > Issue Type: Improvement > > Components: Integration > > Environment: Software Platform > >Reporter: Duraimurugan > > > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >
[jira] [Commented] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628501#comment-14628501 ] Chris A. Mattmann commented on MAHOUT-1041: --- BTW, we have integrated Mahout into Nutch in our Naive Bayes ParseFilter here: https://github.com/apache/nutch/blob/trunk/src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java Yay Mahout! > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration > Environment: Software Platform >Reporter: Duraimurugan > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628500#comment-14628500 ] Chris A. Mattmann commented on MAHOUT-1041: --- Hey folks, we have some interest on my team for DARPA memex in doing this. We'll take a look at jpmml and report back. > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration > Environment: Software Platform >Reporter: Duraimurugan > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: PMML
Yes, it makes sense having one for Naive Bayes and KMeans (when we have that !!). On Thu, Mar 5, 2015 at 11:49 AM, Pat Ferrel wrote: > PMML doesn’t make a lot of sense when the model is a potentially massive > matrix. One reason is that it will be pretty hard (impossible?) to > parallelize read/write with the engines we use. JSON has the same problem > and the only way SchemaRDD can read JSON is by bending the rules. > > Seems like a good thing to support for algos that can make good use of it. > Does that narrow it down to naive bayes today? > > On Mar 5, 2015, at 2:19 AM, Ted Dunning wrote: > > PMML is a machine-to-machine mechanism, not intended really for human > consumption or production. Based on XML, it is, of course, bloated, but > that doesn't really matter for readability since reading isn't the goal. > > The vision of making models easy to transfer from system to system is nice, > but the reality has fallen far short, unfortunately. The problem is that > systems often have special aspects that make it hard to replicate exact > actions from one system to another. Having a textual format for numerical > data doesn't help. > > Here, for instance, is a linear regression model that I created using R: > > http://www.dmg.org/PMML-4_2"; xmlns:xsi=" > http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation=" > http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd";> > > > > 2015-03-05 09:46:32 > > > > > > > > functionName="regression" algorithmName="least squares"> > > > > > > > > > > > >coefficient="-1.00362806356329"/> >coefficient="0.998224481877296"/> > > > > > This looks pretty reasonable (if verbose). It takes 1.5kB to store a > model but this compresses to around 600 bytes. > > More involved models are a different story. I built a simple random forest > on the same data and simply conversion to PMML took several minutes. > Presumably the R package involved is kind of inefficient, but this still is > pretty daunting. Manipulating the resulting PMML representation is > actually quite difficult. > > Saving the random forest model ultimately resulted in a 50MB file. > Compression reduced that to about 6MB. This is pretty massive for a fairly > simple model. > > > > > On Thu, Mar 5, 2015 at 4:25 AM, Andrew Musselman < > andrew.mussel...@gmail.com > > wrote: > > > I think keeping it simple is best, try implementing one or two models in > > XML and then get fancy if it makes sense. > > > > On Wednesday, March 4, 2015, Saikat Kanjilal > wrote: > > > >> Next question: Is the audience for PMML programmers or could it be folks > >> that can script? I'm wondering how this intersects with a simple spark > >> like DSL , could Mahout implement an intersection between the two? If > >> there's interest I can go into examples. > >> > >> Sent from my iPhone > >> > >>> On Mar 4, 2015, at 4:17 PM, Andrew Musselman < > > andrew.mussel...@gmail.com > >> > wrote: > >>> > >>> Sure, those would be options. > >>> > >>>> On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal >> > wrote: > >>>> > >>>> Question, is there a way to introduce PMML with using a more > > lightweight > >>>> format like yaml or json? > >>>> > >>>>> Date: Wed, 4 Mar 2015 13:25:29 -0800 > >>>>> Subject: Re: PMML > >>>>> From: andrew.mussel...@gmail.com > >>>>> To: dev@mahout.apache.org > >>>>> > >>>>> Yes, the limitations are often an issue for people doing things that > >>>> aren't > >>>>> in the PMML spec yet; there could be room for suggesting new features > >> in > >>>>> the spec by building them though, I suppose. > >>>>> > >>>>> Also agree that XML is a lousy/bloated way of representing stuff like > >>>> this, > >>>>> but in the end it's just a choice of representation so there may be > >>>> reason > >>>>> to use some other encoding and then provide an XML-export function. > >>>>> > >>>>>> On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov < > > dlie...@gmail.com > >> > > >>>>> wrote: > >>>>> > >>>>>
Re: PMML
PMML doesn’t make a lot of sense when the model is a potentially massive matrix. One reason is that it will be pretty hard (impossible?) to parallelize read/write with the engines we use. JSON has the same problem and the only way SchemaRDD can read JSON is by bending the rules. Seems like a good thing to support for algos that can make good use of it. Does that narrow it down to naive bayes today? On Mar 5, 2015, at 2:19 AM, Ted Dunning wrote: PMML is a machine-to-machine mechanism, not intended really for human consumption or production. Based on XML, it is, of course, bloated, but that doesn't really matter for readability since reading isn't the goal. The vision of making models easy to transfer from system to system is nice, but the reality has fallen far short, unfortunately. The problem is that systems often have special aspects that make it hard to replicate exact actions from one system to another. Having a textual format for numerical data doesn't help. Here, for instance, is a linear regression model that I created using R: http://www.dmg.org/PMML-4_2"; xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation=" http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd";> 2015-03-05 09:46:32 This looks pretty reasonable (if verbose). It takes 1.5kB to store a model but this compresses to around 600 bytes. More involved models are a different story. I built a simple random forest on the same data and simply conversion to PMML took several minutes. Presumably the R package involved is kind of inefficient, but this still is pretty daunting. Manipulating the resulting PMML representation is actually quite difficult. Saving the random forest model ultimately resulted in a 50MB file. Compression reduced that to about 6MB. This is pretty massive for a fairly simple model. On Thu, Mar 5, 2015 at 4:25 AM, Andrew Musselman wrote: > I think keeping it simple is best, try implementing one or two models in > XML and then get fancy if it makes sense. > > On Wednesday, March 4, 2015, Saikat Kanjilal wrote: > >> Next question: Is the audience for PMML programmers or could it be folks >> that can script? I'm wondering how this intersects with a simple spark >> like DSL , could Mahout implement an intersection between the two? If >> there's interest I can go into examples. >> >> Sent from my iPhone >> >>> On Mar 4, 2015, at 4:17 PM, Andrew Musselman < > andrew.mussel...@gmail.com >> > wrote: >>> >>> Sure, those would be options. >>> >>>> On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal > > wrote: >>>> >>>> Question, is there a way to introduce PMML with using a more > lightweight >>>> format like yaml or json? >>>> >>>>> Date: Wed, 4 Mar 2015 13:25:29 -0800 >>>>> Subject: Re: PMML >>>>> From: andrew.mussel...@gmail.com >>>>> To: dev@mahout.apache.org >>>>> >>>>> Yes, the limitations are often an issue for people doing things that >>>> aren't >>>>> in the PMML spec yet; there could be room for suggesting new features >> in >>>>> the spec by building them though, I suppose. >>>>> >>>>> Also agree that XML is a lousy/bloated way of representing stuff like >>>> this, >>>>> but in the end it's just a choice of representation so there may be >>>> reason >>>>> to use some other encoding and then provide an XML-export function. >>>>> >>>>>> On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov < > dlie...@gmail.com >> > >>>>> wrote: >>>>> >>>>>> I am willing to +1 any contribution at this point. >>>>>> >>>>>> my previous company used pmml to serialize simple stuff, but i don't >>>>>> have first hand experience. Its flexibility is ultimately pretty >>>>>> limited, isn't it? And xml is ultimately a media which is too ugly > and >>>>>> too verbose at the same time to represent models with any more or > less >>>>>> decent number of parameters? >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi < >> suneel.mar...@gmail.com >>>>> >>>>>> wrote: >>>>>>> It makes sense to support PMML for classification and clustering >>>> tasks to >>>>>>> be
Re: PMML
PMML is a machine-to-machine mechanism, not intended really for human consumption or production. Based on XML, it is, of course, bloated, but that doesn't really matter for readability since reading isn't the goal. The vision of making models easy to transfer from system to system is nice, but the reality has fallen far short, unfortunately. The problem is that systems often have special aspects that make it hard to replicate exact actions from one system to another. Having a textual format for numerical data doesn't help. Here, for instance, is a linear regression model that I created using R: http://www.dmg.org/PMML-4_2"; xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation=" http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd";> 2015-03-05 09:46:32 This looks pretty reasonable (if verbose). It takes 1.5kB to store a model but this compresses to around 600 bytes. More involved models are a different story. I built a simple random forest on the same data and simply conversion to PMML took several minutes. Presumably the R package involved is kind of inefficient, but this still is pretty daunting. Manipulating the resulting PMML representation is actually quite difficult. Saving the random forest model ultimately resulted in a 50MB file. Compression reduced that to about 6MB. This is pretty massive for a fairly simple model. On Thu, Mar 5, 2015 at 4:25 AM, Andrew Musselman wrote: > I think keeping it simple is best, try implementing one or two models in > XML and then get fancy if it makes sense. > > On Wednesday, March 4, 2015, Saikat Kanjilal wrote: > > > Next question: Is the audience for PMML programmers or could it be folks > > that can script? I'm wondering how this intersects with a simple spark > > like DSL , could Mahout implement an intersection between the two? If > > there's interest I can go into examples. > > > > Sent from my iPhone > > > > > On Mar 4, 2015, at 4:17 PM, Andrew Musselman < > andrew.mussel...@gmail.com > > > wrote: > > > > > > Sure, those would be options. > > > > > >> On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal > > wrote: > > >> > > >> Question, is there a way to introduce PMML with using a more > lightweight > > >> format like yaml or json? > > >> > > >>> Date: Wed, 4 Mar 2015 13:25:29 -0800 > > >>> Subject: Re: PMML > > >>> From: andrew.mussel...@gmail.com > > >>> To: dev@mahout.apache.org > > >>> > > >>> Yes, the limitations are often an issue for people doing things that > > >> aren't > > >>> in the PMML spec yet; there could be room for suggesting new features > > in > > >>> the spec by building them though, I suppose. > > >>> > > >>> Also agree that XML is a lousy/bloated way of representing stuff like > > >> this, > > >>> but in the end it's just a choice of representation so there may be > > >> reason > > >>> to use some other encoding and then provide an XML-export function. > > >>> > > >>>> On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov < > dlie...@gmail.com > > > > > >>> wrote: > > >>> > > >>>> I am willing to +1 any contribution at this point. > > >>>> > > >>>> my previous company used pmml to serialize simple stuff, but i don't > > >>>> have first hand experience. Its flexibility is ultimately pretty > > >>>> limited, isn't it? And xml is ultimately a media which is too ugly > and > > >>>> too verbose at the same time to represent models with any more or > less > > >>>> decent number of parameters? > > >>>> > > >>>> > > >>>> > > >>>> On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi < > > suneel.mar...@gmail.com > > >>> > > >>>> wrote: > > >>>>> It makes sense to support PMML for classification and clustering > > >> tasks to > > >>>>> be able to share and distribute trained models. Sean, Pat, Dmitriy > > >> and > > >>>> Ted > > >>>>> please chime in. > > >>>>> > > >>>>> PMML support in Mahout was talked about for a long time now but > never > > >>>>> really got any traction to take off. > > >>>>> > > >>>>> +1 to build this. > > >>>>> > > >>>>> On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman < > > >>>>> andrew.mussel...@gmail.com > wrote: > > >>>>> > > >>>>>> How much interest is there in a mahout-pmml module, with a > starting > > >>>> point > > >>>>>> to be able to export a few analytic/scoring jobs to PMML > > >> representation? > > >>>>>> > > >>>>>> I've seen a lot of interest at in being able to use PMML to > > >> translate > > >>>>>> analytic work into production(though I think people talk about it > > >> more > > >>>> than > > >>>>>> they do it), and it could be a benchmark as part of a "definition > of > > >>>> done" > > >>>>>> for any existing/new method we include since there's a spec to > > >> build to. > > >>>>>> > > >>>>>> Best > > >>>>>> Andrew > > >> > > >> > > >
Re: PMML
I think keeping it simple is best, try implementing one or two models in XML and then get fancy if it makes sense. On Wednesday, March 4, 2015, Saikat Kanjilal wrote: > Next question: Is the audience for PMML programmers or could it be folks > that can script? I'm wondering how this intersects with a simple spark > like DSL , could Mahout implement an intersection between the two? If > there's interest I can go into examples. > > Sent from my iPhone > > > On Mar 4, 2015, at 4:17 PM, Andrew Musselman > wrote: > > > > Sure, those would be options. > > > >> On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal > wrote: > >> > >> Question, is there a way to introduce PMML with using a more lightweight > >> format like yaml or json? > >> > >>> Date: Wed, 4 Mar 2015 13:25:29 -0800 > >>> Subject: Re: PMML > >>> From: andrew.mussel...@gmail.com > >>> To: dev@mahout.apache.org > >>> > >>> Yes, the limitations are often an issue for people doing things that > >> aren't > >>> in the PMML spec yet; there could be room for suggesting new features > in > >>> the spec by building them though, I suppose. > >>> > >>> Also agree that XML is a lousy/bloated way of representing stuff like > >> this, > >>> but in the end it's just a choice of representation so there may be > >> reason > >>> to use some other encoding and then provide an XML-export function. > >>> > >>>> On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov > > >>> wrote: > >>> > >>>> I am willing to +1 any contribution at this point. > >>>> > >>>> my previous company used pmml to serialize simple stuff, but i don't > >>>> have first hand experience. Its flexibility is ultimately pretty > >>>> limited, isn't it? And xml is ultimately a media which is too ugly and > >>>> too verbose at the same time to represent models with any more or less > >>>> decent number of parameters? > >>>> > >>>> > >>>> > >>>> On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi < > suneel.mar...@gmail.com > >>> > >>>> wrote: > >>>>> It makes sense to support PMML for classification and clustering > >> tasks to > >>>>> be able to share and distribute trained models. Sean, Pat, Dmitriy > >> and > >>>> Ted > >>>>> please chime in. > >>>>> > >>>>> PMML support in Mahout was talked about for a long time now but never > >>>>> really got any traction to take off. > >>>>> > >>>>> +1 to build this. > >>>>> > >>>>> On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman < > >>>>> andrew.mussel...@gmail.com > wrote: > >>>>> > >>>>>> How much interest is there in a mahout-pmml module, with a starting > >>>> point > >>>>>> to be able to export a few analytic/scoring jobs to PMML > >> representation? > >>>>>> > >>>>>> I've seen a lot of interest at in being able to use PMML to > >> translate > >>>>>> analytic work into production(though I think people talk about it > >> more > >>>> than > >>>>>> they do it), and it could be a benchmark as part of a "definition of > >>>> done" > >>>>>> for any existing/new method we include since there's a spec to > >> build to. > >>>>>> > >>>>>> Best > >>>>>> Andrew > >> > >> >
Re: PMML
Next question: Is the audience for PMML programmers or could it be folks that can script? I'm wondering how this intersects with a simple spark like DSL , could Mahout implement an intersection between the two? If there's interest I can go into examples. Sent from my iPhone > On Mar 4, 2015, at 4:17 PM, Andrew Musselman > wrote: > > Sure, those would be options. > >> On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal wrote: >> >> Question, is there a way to introduce PMML with using a more lightweight >> format like yaml or json? >> >>> Date: Wed, 4 Mar 2015 13:25:29 -0800 >>> Subject: Re: PMML >>> From: andrew.mussel...@gmail.com >>> To: dev@mahout.apache.org >>> >>> Yes, the limitations are often an issue for people doing things that >> aren't >>> in the PMML spec yet; there could be room for suggesting new features in >>> the spec by building them though, I suppose. >>> >>> Also agree that XML is a lousy/bloated way of representing stuff like >> this, >>> but in the end it's just a choice of representation so there may be >> reason >>> to use some other encoding and then provide an XML-export function. >>> >>>> On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov >>> wrote: >>> >>>> I am willing to +1 any contribution at this point. >>>> >>>> my previous company used pmml to serialize simple stuff, but i don't >>>> have first hand experience. Its flexibility is ultimately pretty >>>> limited, isn't it? And xml is ultimately a media which is too ugly and >>>> too verbose at the same time to represent models with any more or less >>>> decent number of parameters? >>>> >>>> >>>> >>>> On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi >> >>>> wrote: >>>>> It makes sense to support PMML for classification and clustering >> tasks to >>>>> be able to share and distribute trained models. Sean, Pat, Dmitriy >> and >>>> Ted >>>>> please chime in. >>>>> >>>>> PMML support in Mahout was talked about for a long time now but never >>>>> really got any traction to take off. >>>>> >>>>> +1 to build this. >>>>> >>>>> On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman < >>>>> andrew.mussel...@gmail.com> wrote: >>>>> >>>>>> How much interest is there in a mahout-pmml module, with a starting >>>> point >>>>>> to be able to export a few analytic/scoring jobs to PMML >> representation? >>>>>> >>>>>> I've seen a lot of interest at in being able to use PMML to >> translate >>>>>> analytic work into production(though I think people talk about it >> more >>>> than >>>>>> they do it), and it could be a benchmark as part of a "definition of >>>> done" >>>>>> for any existing/new method we include since there's a spec to >> build to. >>>>>> >>>>>> Best >>>>>> Andrew >> >>
Re: PMML
Sure, those would be options. On Wed, Mar 4, 2015 at 3:41 PM, Saikat Kanjilal wrote: > Question, is there a way to introduce PMML with using a more lightweight > format like yaml or json? > > > Date: Wed, 4 Mar 2015 13:25:29 -0800 > > Subject: Re: PMML > > From: andrew.mussel...@gmail.com > > To: dev@mahout.apache.org > > > > Yes, the limitations are often an issue for people doing things that > aren't > > in the PMML spec yet; there could be room for suggesting new features in > > the spec by building them though, I suppose. > > > > Also agree that XML is a lousy/bloated way of representing stuff like > this, > > but in the end it's just a choice of representation so there may be > reason > > to use some other encoding and then provide an XML-export function. > > > > On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov > wrote: > > > > > I am willing to +1 any contribution at this point. > > > > > > my previous company used pmml to serialize simple stuff, but i don't > > > have first hand experience. Its flexibility is ultimately pretty > > > limited, isn't it? And xml is ultimately a media which is too ugly and > > > too verbose at the same time to represent models with any more or less > > > decent number of parameters? > > > > > > > > > > > > On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi > > > > wrote: > > > > It makes sense to support PMML for classification and clustering > tasks to > > > > be able to share and distribute trained models. Sean, Pat, Dmitriy > and > > > Ted > > > > please chime in. > > > > > > > > PMML support in Mahout was talked about for a long time now but never > > > > really got any traction to take off. > > > > > > > > +1 to build this. > > > > > > > > On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman < > > > > andrew.mussel...@gmail.com> wrote: > > > > > > > >> How much interest is there in a mahout-pmml module, with a starting > > > point > > > >> to be able to export a few analytic/scoring jobs to PMML > representation? > > > >> > > > >> I've seen a lot of interest at in being able to use PMML to > translate > > > >> analytic work into production(though I think people talk about it > more > > > than > > > >> they do it), and it could be a benchmark as part of a "definition of > > > done" > > > >> for any existing/new method we include since there's a spec to > build to. > > > >> > > > >> Best > > > >> Andrew > > > >> > > > > >
RE: PMML
Question, is there a way to introduce PMML with using a more lightweight format like yaml or json? > Date: Wed, 4 Mar 2015 13:25:29 -0800 > Subject: Re: PMML > From: andrew.mussel...@gmail.com > To: dev@mahout.apache.org > > Yes, the limitations are often an issue for people doing things that aren't > in the PMML spec yet; there could be room for suggesting new features in > the spec by building them though, I suppose. > > Also agree that XML is a lousy/bloated way of representing stuff like this, > but in the end it's just a choice of representation so there may be reason > to use some other encoding and then provide an XML-export function. > > On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov wrote: > > > I am willing to +1 any contribution at this point. > > > > my previous company used pmml to serialize simple stuff, but i don't > > have first hand experience. Its flexibility is ultimately pretty > > limited, isn't it? And xml is ultimately a media which is too ugly and > > too verbose at the same time to represent models with any more or less > > decent number of parameters? > > > > > > > > On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi > > wrote: > > > It makes sense to support PMML for classification and clustering tasks to > > > be able to share and distribute trained models. Sean, Pat, Dmitriy and > > Ted > > > please chime in. > > > > > > PMML support in Mahout was talked about for a long time now but never > > > really got any traction to take off. > > > > > > +1 to build this. > > > > > > On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman < > > > andrew.mussel...@gmail.com> wrote: > > > > > >> How much interest is there in a mahout-pmml module, with a starting > > point > > >> to be able to export a few analytic/scoring jobs to PMML representation? > > >> > > >> I've seen a lot of interest at in being able to use PMML to translate > > >> analytic work into production(though I think people talk about it more > > than > > >> they do it), and it could be a benchmark as part of a "definition of > > done" > > >> for any existing/new method we include since there's a spec to build to. > > >> > > >> Best > > >> Andrew > > >> > >
Re: PMML
Yes, the limitations are often an issue for people doing things that aren't in the PMML spec yet; there could be room for suggesting new features in the spec by building them though, I suppose. Also agree that XML is a lousy/bloated way of representing stuff like this, but in the end it's just a choice of representation so there may be reason to use some other encoding and then provide an XML-export function. On Wed, Mar 4, 2015 at 11:42 AM, Dmitriy Lyubimov wrote: > I am willing to +1 any contribution at this point. > > my previous company used pmml to serialize simple stuff, but i don't > have first hand experience. Its flexibility is ultimately pretty > limited, isn't it? And xml is ultimately a media which is too ugly and > too verbose at the same time to represent models with any more or less > decent number of parameters? > > > > On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi > wrote: > > It makes sense to support PMML for classification and clustering tasks to > > be able to share and distribute trained models. Sean, Pat, Dmitriy and > Ted > > please chime in. > > > > PMML support in Mahout was talked about for a long time now but never > > really got any traction to take off. > > > > +1 to build this. > > > > On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman < > > andrew.mussel...@gmail.com> wrote: > > > >> How much interest is there in a mahout-pmml module, with a starting > point > >> to be able to export a few analytic/scoring jobs to PMML representation? > >> > >> I've seen a lot of interest at in being able to use PMML to translate > >> analytic work into production(though I think people talk about it more > than > >> they do it), and it could be a benchmark as part of a "definition of > done" > >> for any existing/new method we include since there's a spec to build to. > >> > >> Best > >> Andrew > >> >
Re: PMML
I am willing to +1 any contribution at this point. my previous company used pmml to serialize simple stuff, but i don't have first hand experience. Its flexibility is ultimately pretty limited, isn't it? And xml is ultimately a media which is too ugly and too verbose at the same time to represent models with any more or less decent number of parameters? On Tue, Mar 3, 2015 at 8:19 PM, Suneel Marthi wrote: > It makes sense to support PMML for classification and clustering tasks to > be able to share and distribute trained models. Sean, Pat, Dmitriy and Ted > please chime in. > > PMML support in Mahout was talked about for a long time now but never > really got any traction to take off. > > +1 to build this. > > On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman < > andrew.mussel...@gmail.com> wrote: > >> How much interest is there in a mahout-pmml module, with a starting point >> to be able to export a few analytic/scoring jobs to PMML representation? >> >> I've seen a lot of interest at in being able to use PMML to translate >> analytic work into production(though I think people talk about it more than >> they do it), and it could be a benchmark as part of a "definition of done" >> for any existing/new method we include since there's a spec to build to. >> >> Best >> Andrew >>
Re: PMML
It makes sense to support PMML for classification and clustering tasks to be able to share and distribute trained models. Sean, Pat, Dmitriy and Ted please chime in. PMML support in Mahout was talked about for a long time now but never really got any traction to take off. +1 to build this. On Tue, Mar 3, 2015 at 11:14 PM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > How much interest is there in a mahout-pmml module, with a starting point > to be able to export a few analytic/scoring jobs to PMML representation? > > I've seen a lot of interest at in being able to use PMML to translate > analytic work into production(though I think people talk about it more than > they do it), and it could be a benchmark as part of a "definition of done" > for any existing/new method we include since there's a spec to build to. > > Best > Andrew >
PMML
How much interest is there in a mahout-pmml module, with a starting point to be able to export a few analytic/scoring jobs to PMML representation? I've seen a lot of interest at in being able to use PMML to translate analytic work into production(though I think people talk about it more than they do it), and it could be a benchmark as part of a "definition of done" for any existing/new method we include since there's a spec to build to. Best Andrew
[jira] [Comment Edited] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787197#comment-13787197 ] Thomas Darimont edited comment on MAHOUT-1041 at 10/5/13 12:56 PM: --- As this wasn't mentioned yet: one could use something like cascading:pattern (http://www.cascading.org/pattern/) or jpmml-cascading (https://github.com/jpmml/jpmml-cascading) to execute PMML models in Hadoop. was (Author: thomasd): In the meantime one could use something like cascading:pattern (http://www.cascading.org/pattern/) or jpmml-cascading (https://github.com/jpmml/jpmml-cascading) to execute PMML models in Hadoop. > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration > Environment: Software Platform >Reporter: Duraimurugan > Fix For: Backlog > > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787197#comment-13787197 ] Thomas Darimont edited comment on MAHOUT-1041 at 10/5/13 12:55 PM: --- In the meantime one could use something like cascading:pattern (http://www.cascading.org/pattern/) or jpmml-cascading (https://github.com/jpmml/jpmml-cascading) to execute PMML models in Hadoop. was (Author: thomasd): In the meantime one could use something like cascading:pattern (http://www.cascading.org/pattern/) or jpmml-cascading (https://github.com/jpmml/jpmml-cascading) to execute PMML Models in Hadoop. > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration > Environment: Software Platform >Reporter: Duraimurugan > Fix For: Backlog > > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787197#comment-13787197 ] Thomas Darimont commented on MAHOUT-1041: - In the meantime one could use something like cascading:pattern (http://www.cascading.org/pattern/) or jpmml-cascading (https://github.com/jpmml/jpmml-cascading) to execute PMML Models in Hadoop. > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration > Environment: Software Platform >Reporter: Duraimurugan > Fix For: Backlog > > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: Mahout and PMML
Hi Pranay, as Ted already said there were already multiple times the request for PMML support. I would recommend that you read all the JIRA issues about PMML they contain a lot of information what have happened so far: https://issues.apache.org/jira/browse/MAHOUT-1041 Support for PMML https://issues.apache.org/jira/browse/MAHOUT-18 Embrace interoperability with other softwares If you want to implement it go ahead and do so. The following might speed up the integration of your contribution: https://cwiki.apache.org/MAHOUT/how-to-contribute.html Have a great week Manuel Am 02.09.2013 um 17:27 schrieb Ted Dunning: > The ability to export PMML for streaming k-means, Naive Bayes and the > logistic regression classifiers would be useful. > > Nobody has worked on this much yet, but demand, on the other hand, is > pretty sporadic. > > > On Mon, Sep 2, 2013 at 6:06 AM, Pranay Tonpay wrote: > >> Hi, >> It would really help if i can get some information on this to be able to >> plan accordingly... >> >> thx >> pranay >> >> -- >> *From:* Pranay Tonpay >> *To:* "dev@mahout.apache.org" >> *Sent:* Monday, August 26, 2013 12:06 PM >> *Subject:* Mahout and PMML >> >> Hi, >> >> I work as Sr Solutions Architect for Impetus technologies and had been >> using Mahout quite extensively for my work... >> Off-late, i am focusing on PMML related stuff and realized that Mahout, at >> present doesn't seem to have support for that. >> I was keen to know if its there in the road-map and if not, i would like >> to contribute to it ... I am sure, it would be a good "add on" to have for >> Mahout. >> Even in case there is some work going on for PMML, i would like to be a >> part of it and contribute ( if you think, that's feasible) >> >> Pls let me know if any of my assumptions are incorrect. >> Hope to hear from you soon. >> >> thx >> pranay >> >> >> -- Manuel Blechschmidt M.Sc. IT Systems Engineering Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B
Re: Mahout and PMML
The ability to export PMML for streaming k-means, Naive Bayes and the logistic regression classifiers would be useful. Nobody has worked on this much yet, but demand, on the other hand, is pretty sporadic. On Mon, Sep 2, 2013 at 6:06 AM, Pranay Tonpay wrote: > Hi, > It would really help if i can get some information on this to be able to > plan accordingly... > > thx > pranay > > -- > *From:* Pranay Tonpay > *To:* "dev@mahout.apache.org" > *Sent:* Monday, August 26, 2013 12:06 PM > *Subject:* Mahout and PMML > > Hi, > > I work as Sr Solutions Architect for Impetus technologies and had been > using Mahout quite extensively for my work... > Off-late, i am focusing on PMML related stuff and realized that Mahout, at > present doesn't seem to have support for that. > I was keen to know if its there in the road-map and if not, i would like > to contribute to it ... I am sure, it would be a good "add on" to have for > Mahout. > Even in case there is some work going on for PMML, i would like to be a > part of it and contribute ( if you think, that's feasible) > > Pls let me know if any of my assumptions are incorrect. > Hope to hear from you soon. > > thx > pranay > > >
[jira] [Resolved] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-1041. - Resolution: Won't Fix Without a patch, I don't see putting this in. Also, I don't see the benefit of storing largish models in XML. I could see a specific issue that can do I/O of PMML into Mahout's, but I don't see any thing running natively off of PMML. > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration > Environment: Software Platform >Reporter: Duraimurugan > Fix For: Backlog > > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1041: --- Affects Version/s: (was: Backlog) > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration > Environment: Software Platform >Reporter: Duraimurugan > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1041: --- Fix Version/s: Backlog > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration > Environment: Software Platform >Reporter: Duraimurugan > Fix For: Backlog > > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Schelter updated MAHOUT-1041: --- Affects Version/s: (was: 1.0) Backlog > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration >Affects Versions: Backlog > Environment: Software Platform >Reporter: Duraimurugan > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: mahout-pmml
Hi Ted, That makes some sense. I'll probably take a crack at it. Marty On 12/27/2012 12:14 AM, Ted Dunning wrote: Marty, That sounds like a reasonable idea. IF integrated, this would need to be a separate module in any case so for now, it might be easiest for you to simply develop this module independently so that you don't have to wait for others to commit partial results. On Wed, Dec 26, 2012 at 6:52 PM, Marty Kube < martyk...@beavercreekconsulting.com> wrote: I took a look at JPMML... At the bottom of it they have ran a JAXB compiler on the PMML V4 schema to generate Java bindings. I didn't see a lot of value add in JPMML beyond that. I'd say just add the schema and bindings generation to Mahout. The value add here is model mapping from the JAXB generated model into the Mahout models. On 12/20/2012 06:13 AM, Grant Ingersoll wrote: From looking at PMML (http://www.dmg.org/v4-1/**GeneralStructure.html<http://www.dmg.org/v4-1/GeneralStructure.html>), it seems that JPMML is not going to really get us there if it only supports the 4 models listed below. I would think we could go through the structures supported in the link above and then map it to the Algorithms that are supported. To start, perhaps it would make sense to focus on a few like: clustering, naive bayes and perhaps SGD will fit into the regression models. Perhaps try to get K-Means and Naive Bayes to work first. FTR, I can only imagine how bloated these files are going to get since they use XML. Thankfully, they won't be used to power the internals, just to support interoperability. -Grant On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote: Hi All, as Grant suggested, I forward the email about mahout-pmml. I already tried jpmml standalone and works fine for me, the next important point is to understand or maybe create some example for each model described before: NeuralNetwork RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature) RegressionModel TreeModel with only Mahout and next step create a convertor to create object from jpmml to Mahout. This is related only to import the object and for me the export object is more similar to these. Do you agree? Are you interested in this models? Or Mahout focus on another one? regards, Simon -- Forwarded message -- From: Simon Vocella Date: Mon, Dec 17, 2012 at 1:50 AM Subject: mahout-pmml To: Grant Ingersoll Cc: Marty Kube Hi Grant, I start with this is the project https://github.com/voxsim/**mahout-pmml<https://github.com/voxsim/mahout-pmml>(I pushed only the skeleton for now) with mahout and jpmml integration ( http://code.google.com/p/**jpmml/ <http://code.google.com/p/jpmml/>) I read the wiki about weka convertor https://cwiki.apache.org/** MAHOUT/creating-vectors-from-**wekas-arff-format.html<https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html> And I read the integration with Lucene http://searchhub.org/2010/03/** 16/integrating-apache-mahout-**with-apache-lucene-and-solr-** part-i-of-3/<http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/> In theory we need to do more similar to these parts, but different, we don't transfrom vector but model, Do i understand correctly? I'll request directly to you because you have in mind this idea and for now jpmml support this models NeuralNetwork RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature) RegressionModel TreeModel Are you interested in this models? Or Mahout focus on another one? Simon PS Marty before to start I need some answers sorry XD --**-- Grant Ingersoll http://www.lucidworks.com
Re: mahout-pmml
Marty, That sounds like a reasonable idea. IF integrated, this would need to be a separate module in any case so for now, it might be easiest for you to simply develop this module independently so that you don't have to wait for others to commit partial results. On Wed, Dec 26, 2012 at 6:52 PM, Marty Kube < martyk...@beavercreekconsulting.com> wrote: > I took a look at JPMML... At the bottom of it they have ran a JAXB > compiler on the PMML V4 schema to generate Java bindings. I didn't see a > lot of value add in JPMML beyond that. > > I'd say just add the schema and bindings generation to Mahout. The value > add here is model mapping from the JAXB generated model into the Mahout > models. > > On 12/20/2012 06:13 AM, Grant Ingersoll wrote: > >> From looking at PMML >> (http://www.dmg.org/v4-1/**GeneralStructure.html<http://www.dmg.org/v4-1/GeneralStructure.html>), >> it seems that JPMML is not going to really get us there if it only supports >> the 4 models listed below. I would think we could go through the >> structures supported in the link above and then map it to the Algorithms >> that are supported. To start, perhaps it would make sense to focus on a >> few like: clustering, naive bayes and perhaps SGD will fit into the >> regression models. Perhaps try to get K-Means and Naive Bayes to work >> first. >> >> FTR, I can only imagine how bloated these files are going to get since >> they use XML. Thankfully, they won't be used to power the internals, just >> to support interoperability. >> >> -Grant >> >> On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote: >> >> Hi All, >>> >>> as Grant suggested, I forward the email about mahout-pmml. >>> I already tried jpmml standalone and works fine for me, the next >>> important point is to understand or maybe create some example for each >>> model described before: >>> NeuralNetwork >>> RandomForest (implemented via Segmentation, which is a PMML version 4.0 >>> feature) >>> RegressionModel >>> TreeModel >>> with only Mahout and next step create a convertor to create object from >>> jpmml to Mahout. This is related only to import the object and for me the >>> export object is more similar to these. >>> >>> Do you agree? Are you interested in this models? Or Mahout focus on >>> another one? >>> >>> regards, >>> Simon >>> >>> -- Forwarded message -- >>> From: Simon Vocella >>> Date: Mon, Dec 17, 2012 at 1:50 AM >>> Subject: mahout-pmml >>> To: Grant Ingersoll >>> Cc: Marty Kube >>> >>> > >>> >>> >>> Hi Grant, >>> >>> I start with this is the project >>> https://github.com/voxsim/**mahout-pmml<https://github.com/voxsim/mahout-pmml>(I >>> pushed only the skeleton for now) with mahout and jpmml integration ( >>> http://code.google.com/p/**jpmml/ <http://code.google.com/p/jpmml/>) >>> >>> I read the wiki about weka convertor https://cwiki.apache.org/** >>> MAHOUT/creating-vectors-from-**wekas-arff-format.html<https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html> >>> And I read the integration with Lucene http://searchhub.org/2010/03/** >>> 16/integrating-apache-mahout-**with-apache-lucene-and-solr-** >>> part-i-of-3/<http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/> >>> >>> In theory we need to do more similar to these parts, but different, we >>> don't transfrom vector but model, Do i understand correctly? >>> >>> I'll request directly to you because you have in mind this idea and for >>> now jpmml support this models >>> NeuralNetwork >>> RandomForest (implemented via Segmentation, which is a PMML version 4.0 >>> feature) >>> RegressionModel >>> TreeModel >>> Are you interested in this models? Or Mahout focus on another one? >>> >>> Simon >>> >>> PS Marty before to start I need some answers sorry XD >>> >>> --**-- >> Grant Ingersoll >> http://www.lucidworks.com >> >> >> >> >> >> >
Re: mahout-pmml
I took a look at JPMML... At the bottom of it they have ran a JAXB compiler on the PMML V4 schema to generate Java bindings. I didn't see a lot of value add in JPMML beyond that. I'd say just add the schema and bindings generation to Mahout. The value add here is model mapping from the JAXB generated model into the Mahout models. On 12/20/2012 06:13 AM, Grant Ingersoll wrote: From looking at PMML (http://www.dmg.org/v4-1/GeneralStructure.html), it seems that JPMML is not going to really get us there if it only supports the 4 models listed below. I would think we could go through the structures supported in the link above and then map it to the Algorithms that are supported. To start, perhaps it would make sense to focus on a few like: clustering, naive bayes and perhaps SGD will fit into the regression models. Perhaps try to get K-Means and Naive Bayes to work first. FTR, I can only imagine how bloated these files are going to get since they use XML. Thankfully, they won't be used to power the internals, just to support interoperability. -Grant On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote: Hi All, as Grant suggested, I forward the email about mahout-pmml. I already tried jpmml standalone and works fine for me, the next important point is to understand or maybe create some example for each model described before: NeuralNetwork RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature) RegressionModel TreeModel with only Mahout and next step create a convertor to create object from jpmml to Mahout. This is related only to import the object and for me the export object is more similar to these. Do you agree? Are you interested in this models? Or Mahout focus on another one? regards, Simon -- Forwarded message -- From: Simon Vocella Date: Mon, Dec 17, 2012 at 1:50 AM Subject: mahout-pmml To: Grant Ingersoll Cc: Marty Kube Hi Grant, I start with this is the project https://github.com/voxsim/mahout-pmml (I pushed only the skeleton for now) with mahout and jpmml integration (http://code.google.com/p/jpmml/) I read the wiki about weka convertor https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html And I read the integration with Lucene http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ In theory we need to do more similar to these parts, but different, we don't transfrom vector but model, Do i understand correctly? I'll request directly to you because you have in mind this idea and for now jpmml support this models NeuralNetwork RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature) RegressionModel TreeModel Are you interested in this models? Or Mahout focus on another one? Simon PS Marty before to start I need some answers sorry XD Grant Ingersoll http://www.lucidworks.com
Re: mahout-pmml
From looking at PMML (http://www.dmg.org/v4-1/GeneralStructure.html), it seems that JPMML is not going to really get us there if it only supports the 4 models listed below. I would think we could go through the structures supported in the link above and then map it to the Algorithms that are supported. To start, perhaps it would make sense to focus on a few like: clustering, naive bayes and perhaps SGD will fit into the regression models. Perhaps try to get K-Means and Naive Bayes to work first. FTR, I can only imagine how bloated these files are going to get since they use XML. Thankfully, they won't be used to power the internals, just to support interoperability. -Grant On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote: > Hi All, > > as Grant suggested, I forward the email about mahout-pmml. > I already tried jpmml standalone and works fine for me, the next important > point is to understand or maybe create some example for each model described > before: > NeuralNetwork > RandomForest (implemented via Segmentation, which is a PMML version 4.0 > feature) > RegressionModel > TreeModel > with only Mahout and next step create a convertor to create object from jpmml > to Mahout. This is related only to import the object and for me the export > object is more similar to these. > > Do you agree? Are you interested in this models? Or Mahout focus on another > one? > > regards, > Simon > > -- Forwarded message -- > From: Simon Vocella > Date: Mon, Dec 17, 2012 at 1:50 AM > Subject: mahout-pmml > To: Grant Ingersoll > Cc: Marty Kube > > > Hi Grant, > > I start with this is the project https://github.com/voxsim/mahout-pmml (I > pushed only the skeleton for now) with mahout and jpmml integration > (http://code.google.com/p/jpmml/) > > I read the wiki about weka convertor > https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html > And I read the integration with Lucene > http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ > > In theory we need to do more similar to these parts, but different, we don't > transfrom vector but model, Do i understand correctly? > > I'll request directly to you because you have in mind this idea and for now > jpmml support this models > NeuralNetwork > RandomForest (implemented via Segmentation, which is a PMML version 4.0 > feature) > RegressionModel > TreeModel > Are you interested in this models? Or Mahout focus on another one? > > Simon > > PS Marty before to start I need some answers sorry XD > Grant Ingersoll http://www.lucidworks.com
Fwd: mahout-pmml
Hi All, as Grant suggested, I forward the email about mahout-pmml. I already tried jpmml standalone and works fine for me, the next important point is to understand or maybe create some example for each model described before: - NeuralNetwork - RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature) - RegressionModel - TreeModel with only Mahout and next step create a convertor to create object from jpmml to Mahout. This is related only to import the object and for me the export object is more similar to these. Do you agree? Are you interested in this models? Or Mahout focus on another one? regards, Simon -- Forwarded message -- From: Simon Vocella Date: Mon, Dec 17, 2012 at 1:50 AM Subject: mahout-pmml To: Grant Ingersoll Cc: Marty Kube Hi Grant, I start with this is the project https://github.com/voxsim/mahout-pmml (I pushed only the skeleton for now) with mahout and jpmml integration ( http://code.google.com/p/jpmml/) I read the wiki about weka convertor https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html And I read the integration with Lucene http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/ In theory we need to do more similar to these parts, but different, we don't transfrom vector but model, Do i understand correctly? I'll request directly to you because you have in mind this idea and for now jpmml support this models - NeuralNetwork - RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature) - RegressionModel - TreeModel Are you interested in this models? Or Mahout focus on another one? Simon PS Marty before to start I need some answers sorry XD
[jira] [Commented] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407980#comment-13407980 ] Manuel Blechschmidt commented on MAHOUT-1041: - This has already been proposed about 4 years ago in MAHOUT-18. Currently there is nobody who has the time and the knowledge who can implement such an exporter. It would be great if you can provide a patch and I would expect that it has good chances to get integrated into Mahout as long as it follows the following rules: * https://cwiki.apache.org/MAHOUT/how-to-contribute.html > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration >Affects Versions: 1.0 > Environment: Software Platform >Reporter: Duraimurugan > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407972#comment-13407972 ] Duraimurugan commented on MAHOUT-1041: -- Sure, I can contribute on PMML parser and model code. > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration >Affects Versions: 1.0 > Environment: Software Platform >Reporter: Duraimurugan > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1041) Support for PMML
[ https://issues.apache.org/jira/browse/MAHOUT-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407744#comment-13407744 ] Ted Dunning commented on MAHOUT-1041: - This has been proposed before and has withered when there wasn't much support in terms of code contributions. What sort of contributions can you provide? Problem specification? PMML parser? Model code? > Support for PMML > > > Key: MAHOUT-1041 > URL: https://issues.apache.org/jira/browse/MAHOUT-1041 > Project: Mahout > Issue Type: Improvement > Components: Integration >Affects Versions: 1.0 > Environment: Software Platform >Reporter: Duraimurugan > > Would like to request a support for PMML. With that once the predictive > models are built and provided in PMML format, we should be able to import > into hadoop cluster for scoring. This way models built in external > (non-mahout) systems can be imported to Hadoop/Mahout for scalable > environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAHOUT-1041) Support for PMML
Duraimurugan created MAHOUT-1041: Summary: Support for PMML Key: MAHOUT-1041 URL: https://issues.apache.org/jira/browse/MAHOUT-1041 Project: Mahout Issue Type: Improvement Components: Integration Affects Versions: 1.0 Environment: Software Platform Reporter: Duraimurugan Would like to request a support for PMML. With that once the predictive models are built and provided in PMML format, we should be able to import into hadoop cluster for scoring. This way models built in external (non-mahout) systems can be imported to Hadoop/Mahout for scalable environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira