Re: Release Manager

2016-09-08 Thread Pat Ferrel
Thanks Donald! We still need to cleanup 2 things, 1) the LICENSE.txt and NOTICE.txt now has to reflect only sources included. I’ll update the license Jira and will do the editing (easy now). 2) the install.sh *needs testing*, please anyone who can try it, on different distros (Mac, Red

Re: Setup PredictionIO for large events

2016-09-06 Thread Pat Ferrel
lps. Tom On Sep 5, 2016 9:05 PM, "Digambar Bhat" <digambarbha...@gmail.com <mailto:digambarbha...@gmail.com>> wrote: Update please.. On 30-Aug-2016 8:06 pm, "Digambar Bhat" <digambarbha...@gmail.com <mailto:digambarbha...@gmail.com>> wrot

Re: Binary or Source release

2016-09-06 Thread Pat Ferrel
it and the install.sh can be made to point to it. Not sure what the ASF rules are regarding this so maybe the mentors can comment—specifically do we have to use the Apache mirror system? On Sep 5, 2016, at 4:34 PM, Pat Ferrel <p...@occamsmachete.com> wrote: On Sep 5, 2016, at 1:25 PM, Alex M

Binary or Source release

2016-09-05 Thread Pat Ferrel
This weekend I tracked down all out deps, which required a few scripts to process sbt output. This yielded 166 deps, so this implies we need to include 166 licenses and copyright notices in LICENSE.txt. As I read the Apache guidelines this should be the license that goes with the version we

[jira] [Commented] (PIO-27) Check release artifacts for licenses and the LICENSE.txt file

2016-09-03 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15461979#comment-15461979 ] Pat Ferrel commented on PIO-27: --- The current install script does a build from source. The old PIO one did

[jira] [Commented] (PIO-27) Check release artifacts for licenses and the LICENSE.txt file

2016-09-03 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15461706#comment-15461706 ] Pat Ferrel commented on PIO-27: --- just to document for the next person that does it, for transitive deps we need

[jira] [Commented] (PIO-27) Check release artifacts for licenses and the LICENSE.txt file

2016-09-03 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15461664#comment-15461664 ] Pat Ferrel commented on PIO-27: --- This seems to require us to go back to the source of the specific version

[jira] [Commented] (PIO-27) Check release artifacts for licenses and the LICENSE.txt file

2016-09-01 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456370#comment-15456370 ] Pat Ferrel commented on PIO-27: --- [~dszeto] Will start work on the text license part of this Friday Sept 2

Re: Apache PIO v0.10.0 release

2016-09-01 Thread Pat Ferrel
later this week. I also hope to start the release candidate process in the 1st week of September. The community has been asking and we'd like to ship it to them. On Tue, Aug 30, 2016 at 10:05 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > It’s been over a week waiting for the template don

Re: Regarding PredictionIO templates

2016-09-01 Thread Pat Ferrel
ioned, the best way to do this is via pull request to the template >>>> gallery. Below are the links to template gallery page and instructions. For >>>> both namespace updates and your new template, you can just specify the >>>> pio_min_version as 0.10.0 and it will work with

[jira] [Comment Edited] (PIO-30) Cross build for different versions of scala and spark

2016-08-30 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449714#comment-15449714 ] Pat Ferrel edited comment on PIO-30 at 8/30/16 6:18 PM: How does Spark do

[jira] [Comment Edited] (PIO-30) Cross build for different versions of scala and spark

2016-08-30 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449714#comment-15449714 ] Pat Ferrel edited comment on PIO-30 at 8/30/16 6:16 PM: How does Spark do

[jira] [Commented] (PIO-30) Cross build for different versions of scala and spark

2016-08-30 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449705#comment-15449705 ] Pat Ferrel commented on PIO-30: --- Fine with me if we can do with cross-build but it's not just different

[jira] [Commented] (PIO-32) Create component upgrade release branch

2016-08-30 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449615#comment-15449615 ] Pat Ferrel commented on PIO-32: --- Might as well add new ES 2.x compatibility since we will break backwards binary

[jira] [Created] (PIO-33) Add support for Elasticsearch 2.x to upgrade release branch

2016-08-30 Thread Pat Ferrel (JIRA)
Pat Ferrel created PIO-33: - Summary: Add support for Elasticsearch 2.x to upgrade release branch Key: PIO-33 URL: https://issues.apache.org/jira/browse/PIO-33 Project: PredictionIO Issue Type: New

[jira] [Commented] (PIO-32) Create component upgrade release branch

2016-08-30 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449604#comment-15449604 ] Pat Ferrel commented on PIO-32: --- may be other things to add to this upgrade release branch > Create compon

Apache PIO v0.10.0 release

2016-08-30 Thread Pat Ferrel
It’s been over a week waiting for the template donation paperwork, which I imagine will take a few weeks to go through repo creation, license validation and PR merging before they can be released. Since they can be on a separate release schedules I’d like to remove them from being blocking

Re: Regarding PredictionIO templates

2016-08-30 Thread Pat Ferrel
6 at 2:21 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > Hi Bansari, > > Chan Lee has done this too, and added tests from the new integration test > framework. We are awaiting a donation from Saleforce and new repo creation > before anything can be pushed. > > If you wo

[jira] [Commented] (PIO-24) move templates to apache git repos

2016-08-22 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431865#comment-15431865 ] Pat Ferrel commented on PIO-24: --- Sorry [~chanlee514] my mistake. you don't need an ICLA until you are doing

[jira] [Commented] (PIO-25) Don't attempt to start PostgreSQL when it's not being used

2016-08-22 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431258#comment-15431258 ] Pat Ferrel commented on PIO-25: --- A pull request or patch is welcome. > Don't attempt to start PostgreSQL w

Re: Apache PIO v0.10.0 release

2016-08-21 Thread Pat Ferrel
gardless it would be good if someone reviewed the release artifacts now > and validates the License and Notices as opposed to pushing a release and > getting -1 vote from IPMC. > > > >> On Sat, Aug 20, 2016 at 2:21 PM, Pat Ferrel <p...@occamsmachete.com> wrote: >&

[jira] [Commented] (PIO-27) Check release artifacts for licenses and the LICENSE.txt file

2016-08-21 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429798#comment-15429798 ] Pat Ferrel commented on PIO-27: --- Further @dev comments It took Gearpump six release candidates before

[jira] [Updated] (PIO-27) Check release artifacts for licenses and the LICENSE.txt file

2016-08-21 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel updated PIO-27: -- Assignee: (was: Pat Ferrel) > Check release artifacts for licenses and the LICENSE.txt f

[jira] [Commented] (MAHOUT-1878) implement quartile type thresholds for indicator matrix downsampling

2016-08-20 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429553#comment-15429553 ] Pat Ferrel commented on MAHOUT-1878: see discussion here https://issues.apache.org/jira/browse/MAHOUT

[jira] [Issue Comment Deleted] (MAHOUT-1679) example script run-item-sim should work on hdfs as well as local

2016-08-20 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel updated MAHOUT-1679: --- Comment: was deleted (was: see discussion https://issues.apache.org/jira/browse/MAHOUT-1853

[jira] [Commented] (MAHOUT-1679) example script run-item-sim should work on hdfs as well as local

2016-08-20 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429552#comment-15429552 ] Pat Ferrel commented on MAHOUT-1679: see discussion https://issues.apache.org/jira/browse/MAHOUT-1853

[jira] [Created] (MAHOUT-1878) implement quartile type thresholds for indicator matrix downsampling

2016-08-20 Thread Pat Ferrel (JIRA)
Pat Ferrel created MAHOUT-1878: -- Summary: implement quartile type thresholds for indicator matrix downsampling Key: MAHOUT-1878 URL: https://issues.apache.org/jira/browse/MAHOUT-1878 Project: Mahout

[jira] [Commented] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

2016-08-20 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429550#comment-15429550 ] Pat Ferrel commented on MAHOUT-1853: ok first part implemented. Not sure Ted's suggestion will get

Re: Apache PIO v0.10.0 release

2016-08-20 Thread Pat Ferrel
ensure that all third party jars have been accounted for and the License and Notice files are included in the appropriate project release artifacts. On Sat, Aug 20, 2016 at 2:00 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > What do people think remains for release? > > 1) templ

Apache PIO v0.10.0 release

2016-08-20 Thread Pat Ferrel
What do people think remains for release? 1) template donation and mods. Chan Lee has done work on this but we can’t review until the donation and repos are set up. 2) install.sh. There are some suggestions on how to deal with the one-line install here:

[jira] [Updated] (PIO-18) Documentation for setting up the project for developers

2016-08-20 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel updated PIO-18: -- Priority: Minor (was: Major) > Documentation for setting up the project for develop

[jira] [Commented] (PIO-18) Documentation for setting up the project for developers

2016-08-20 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429466#comment-15429466 ] Pat Ferrel commented on PIO-18: --- I think this works, but you need to create the correct class for the entry

[jira] [Commented] (PIO-13) Update Gemnasium dependency status

2016-08-20 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429462#comment-15429462 ] Pat Ferrel commented on PIO-13: --- This seems to have been fixed. [~xusen] I'll close but can you check to see

[jira] [Commented] (PIO-11) Current version (v0.9.8) of python sdk requires python3

2016-08-20 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429460#comment-15429460 ] Pat Ferrel commented on PIO-11: --- Can you describe this. Did you mean python3? I use it with python 2.7

[jira] [Updated] (PIO-9) Clean up examples under examples/

2016-08-20 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel updated PIO-9: - Priority: Minor (was: Major) > Clean up examples under examp

Re: templates

2016-08-20 Thread Pat Ferrel
t 9:54 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > > "a week or so” is the first I’ve heard of an order of magnitude and would > agree if that’s true. I’ll remove the suggestion but I think there is a > bigger issue here. > > Understood about what would be gran

Re: templates

2016-08-20 Thread Pat Ferrel
e templates > have not yet been donated and imported, I think you should wait. If you > post a release of PIO that depends on templates in this uncertain condition > I'm afraid I would need to vote -1 (binding) until this is resolved or we > have clarification it's ok. > >> On A

Old Google Group and lost users

2016-08-20 Thread Pat Ferrel
The old Google Group for PIO was fairly active but it seems all of those users have vanished. I suggest that this is bad for the project as well as the user base. One possible solution is to retract the no-posting rule and forward all posts to the user@ list address. Then questions will get

[jira] [Commented] (PIO-24) move templates to apache git repos

2016-08-20 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429432#comment-15429432 ] Pat Ferrel commented on PIO-24: --- [~chanlee514] are you willing to put your modified templates in the new

Re: templates

2016-08-19 Thread Pat Ferrel
space and update minimum PIO version as 0.10. I will leave out the tests for now. But personally, I think it would be better if we wait for the legal grant issue to be resolved, so that it is clearer how template code should be managed. On Thu, Aug 18, 2016 at 1:13 PM, Pat Ferrel <p..

templates

2016-08-18 Thread Pat Ferrel
I think we are not waiting for the official template donation to release PIO, can you point me to the templates you have working? I’ll make sure they get added to the new gallery. We can push them to Apache once the grant is done. Thanks for the help.

[jira] [Created] (PIO-24) move templates to apache git repos

2016-08-16 Thread Pat Ferrel (JIRA)
Pat Ferrel created PIO-24: - Summary: move templates to apache git repos Key: PIO-24 URL: https://issues.apache.org/jira/browse/PIO-24 Project: PredictionIO Issue Type: Task Affects Versions

Re: 0.10.0 release

2016-08-16 Thread Pat Ferrel
BTW the infra Jiras to get repos created is probably critical path so can someone coach me on what to ask for? On Aug 16, 2016, at 11:16 AM, Pat Ferrel <p...@occamsmachete.com> wrote: I’ve been talking abut this issue since before we were in the incubator, probably because I mainta

Re: 0.10.0 release

2016-08-16 Thread Pat Ferrel
pinion in mandating one engine one repo, but I think otherwise it would be hard for customized engines to merge upstream official changes. On Mon, Aug 15, 2016 at 3:05 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > Possible but we have template in examples/ for testing and that’s ok imo

Re: 0.10.0 release

2016-08-15 Thread Pat Ferrel
important. Also, as Marcin mentioned, we can expand tests to keep track of performance changes as well. Also, I think we should consider merging the current examples/ directory with the templates/ directory. Thanks, Chan On Sun, Aug 14, 2016 at 8:30 AM, Pat Ferrel <p...@occamsmachete.com &

[jira] [Created] (PIO-23) links in release.md need to be removed or updated

2016-08-15 Thread Pat Ferrel (JIRA)
Pat Ferrel created PIO-23: - Summary: links in release.md need to be removed or updated Key: PIO-23 URL: https://issues.apache.org/jira/browse/PIO-23 Project: PredictionIO Issue Type: Task

[jira] [Commented] (PIO-21) Checks in Travis build not the same as local build

2016-08-14 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420548#comment-15420548 ] Pat Ferrel commented on PIO-21: --- I think it is my local problem, can't run the integration test due to no way

[jira] [Resolved] (PIO-21) Checks in Travis build not the same as local build

2016-08-14 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel resolved PIO-21. --- Resolution: Invalid > Checks in Travis build not the same as local bu

Re: 0.10.0 release

2016-08-14 Thread Pat Ferrel
actice. > On Aug 14, 2016, at 8:30 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > > Important things unresolved for release: > > 1) What to do with Apache templates—my suggestion is separate repos for the 3 > reasons below. > 2) We need a Gallery page listing a

0.10.0 release

2016-08-14 Thread Pat Ferrel
ther than releasing with templates in a templates/ directory. Anything else? How are people feeling about release? On Aug 12, 2016, at 10:18 AM, Pat Ferrel <p...@occamsmachete.com> wrote: Independent of Apache I’d suggest each template have their own git repo as they do now. Is this p

Re: PIO-20 problems

2016-08-14 Thread Pat Ferrel
avis now. On Fri, Aug 12, 2016 at 3:04 PM, Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> wrote: Alex, can you look at these unit test failures on the PR, they seem to be in JDBCPEvents https://travis-ci.org/apache/incubator-predictionio/builds/151905196 <https

Re: PIO-20 problems

2016-08-12 Thread Pat Ferrel
Alex, can you look at these unit test failures on the PR, they seem to be in JDBCPEvents https://travis-ci.org/apache/incubator-predictionio/builds/151905196 On Aug 12, 2016, at 1:18 PM, Pat Ferrel <p...@occamsmachete.com> wrote: Can't install unittest with pip or pip3 even though th

Re: PIO-20 problems

2016-08-12 Thread Pat Ferrel
distribution found for unittest Tried forcing the version to 0.0 but again no luck Ideas? In the meantime waiting for Travis to do it—sigh On Aug 11, 2016, at 2:31 PM, Pat Ferrel <p...@occamsmachete.com> wrote: With the keystore my template-based integration test passes and I’ve put the keystor

Re: transition from "official" PredictionIO templates to Apache

2016-08-12 Thread Pat Ferrel
of every committer's effort. > Just my 2c. > > On Sun, Aug 7, 2016 at 9:47 AM, Pat Ferrel <p...@occamsmachete.com > <mailto:p...@occamsmachete.com>> wrote: > > > If this sound ok, I propose the process be: > > 1) inclusion in the gallery is a PR to the yaml

Re: PIO-20 problems

2016-08-11 Thread Pat Ferrel
the > README in the same directory, but maybe this was not clear enough. If you > have any additional questions or run into issues executing the tests, > please let me know. > > Thanks, > Chan > > On Thu, Aug 11, 2016 at 6:07 AM, Pat Ferrel <p...@occamsmachete.com>

[jira] [Commented] (PIO-1) Make SSL and authKey param authentication optional

2016-08-10 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416212#comment-15416212 ] Pat Ferrel commented on PIO-1: -- Templates still fail on `pio deploy` when SSL is not configured

[jira] [Created] (PIO-21) Checks in Travis build not the same as local build

2016-08-10 Thread Pat Ferrel (JIRA)
Pat Ferrel created PIO-21: - Summary: Checks in Travis build not the same as local build Key: PIO-21 URL: https://issues.apache.org/jira/browse/PIO-21 Project: PredictionIO Issue Type: Bug

Re: Trying to merge PR

2016-08-10 Thread Pat Ferrel
) ... 18 more On Aug 10, 2016, at 1:31 PM, Pat Ferrel <p...@actionml.com> wrote: I’m trying to build a template against the aml/apache merged pio. Obviously there are a few template code changes needed so I started with a completely clean EventServer meaning HBase and Elasticsearch were comp

Trying to merge PR

2016-08-10 Thread Pat Ferrel
I’m trying to build a template against the aml/apache merged pio. Obviously there are a few template code changes needed so I started with a completely clean EventServer meaning HBase and Elasticsearch were completely wiped and restarted. I deleted manifest.json and was able to build, and

[jira] [Created] (PIO-20) Merge ActionML fork

2016-08-08 Thread Pat Ferrel (JIRA)
Pat Ferrel created PIO-20: - Summary: Merge ActionML fork Key: PIO-20 URL: https://issues.apache.org/jira/browse/PIO-20 Project: PredictionIO Issue Type: Bug Reporter: Pat Ferrel

[jira] [Commented] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

2016-08-05 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409595#comment-15409595 ] Pat Ferrel commented on MAHOUT-1853: Great, that's what I wanted to hear. Normal in principal

Re: transition from "official" PredictionIO templates to Apache

2016-08-04 Thread Pat Ferrel
Actually this is mostly a fine idea but I think the bigger question is how do we treat templates in general. IMO the maintaining author can decide to contribute them or not and the committers can decide to accept or not. For example in the case of the UR I may decide to support and maintain it

[jira] [Commented] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

2016-08-04 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408326#comment-15408326 ] Pat Ferrel commented on MAHOUT-1853: If t-digest is more tolerant of "not having enough data&

[jira] [Commented] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

2016-08-04 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408256#comment-15408256 ] Pat Ferrel commented on MAHOUT-1853: is rootLLR normally distributed (the positive half)? If so we'd

[jira] [Updated] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

2016-08-04 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel updated MAHOUT-1853: --- Sprint: Jan/Feb-2016 > Improvements to CCO (Correlated Cross-Occurre

Re: Inquery about the stroage and other file formats on HDFS and local FS.

2016-08-03 Thread Pat Ferrel
This would work for some templates but not all. The Events as a collection need to support PEvents and LEvents APIs and files would make those type of queries rather difficult. I believe the current philosophy for PIO is that to include something in the core it would need to support all

Re: [Proposal] Integration tests

2016-07-30 Thread Pat Ferrel
new/delete/...), and template(s). Maybe the directory structure can be > something like this? > root/ > test/ >integration/ > EventserverSuite.scala > AppSuite.scala > SimilarProductTemplateSuite.scala > > seed/ > json data f

[jira] [Commented] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

2016-07-24 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391126#comment-15391126 ] Pat Ferrel commented on MAHOUT-1853: To reword this issue... The CCO analysis code currently only

Re: HBase warn on startup

2016-07-13 Thread Pat Ferrel
5.mbox/%3ccaep_jwbf1kghmsl272hymtodskts68tp6b3lh2ecsq-nwe9...@mail.gmail.com%3E <http://mail-archives.apache.org/mod_mbox/hbase-user/201505.mbox/%3ccaep_jwbf1kghmsl272hymtodskts68tp6b3lh2ecsq-nwe9...@mail.gmail.com%3E> On Wed, Jul 13, 2016 at 2:42 PM, Pat Ferrel <p...@occamsmachete.com <mailto:p...

HBase warn on startup

2016-07-13 Thread Pat Ferrel
I saw a question Tom asked about this error when doing `pio status` Was there every an answer for what causes it? [INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)... [INFO] [Storage$] Test writing to Event Store (App Id 0)... [WARN] [MetaReader] No serialized HRegionInfo in

Re: All Github PR closed

2016-07-04 Thread Pat Ferrel
everal projects I work on. > On Jul 4, 2016, at 10:35 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > > Creating a Jira for a PR is all well and fine but using pathces + Jira is a > pretty archaic and cumbersome way to handle PRs. If we simply reopen the PR > against the

Re: The future of PIO

2016-07-03 Thread Pat Ferrel
On Sat, Jul 2, 2016 at 8:58 PM, Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> wrote: For the last year some of us have had the experience of creating several applications with PIO for users and it is still our go-to platform for ML apps. However it has led

Re: All Github PR closed

2016-06-30 Thread Pat Ferrel
org> wrote: But not all PRs are closed, so it left me wondering if there is a set of conditions that were triggered when GitHub integration was turned on. On Thursday, June 30, 2016, Pat Ferrel <p...@occamsmachete.com> wrote: > Maybe I missed the explanation but why are all the gi

Re: Scaling up spark Iitem similarity on big data data sets

2016-06-23 Thread Pat Ferrel
In addition to increasing downsampling there are some other things to note. The original OOM was caused by the use of BiMaps to store your row and column ids. These will increase with the size of the total storage needed for 2 hashmaps per id type. With only 16g you may have very little else

Re: Rename tables or swap alias

2016-06-06 Thread Pat Ferrel
en in step #1. See http://hbase.apache.org/book.html#ops.snapshots.clone <http://hbase.apache.org/book.html#ops.snapshots.clone> Cheers On Tue, Feb 16, 2016 at 6:35 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > I think I can work out the algorithm if I knew precisely what a “

[jira] [Comment Edited] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

2016-05-26 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302371#comment-15302371 ] Pat Ferrel edited comment on MAHOUT-1853 at 5/26/16 5:04 PM: - Steps: 1

[jira] [Comment Edited] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

2016-05-26 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302371#comment-15302371 ] Pat Ferrel edited comment on MAHOUT-1853 at 5/26/16 5:03 PM: - Steps: 1

[jira] [Commented] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

2016-05-26 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302371#comment-15302371 ] Pat Ferrel commented on MAHOUT-1853: Steps: 1) allow an array of absolute LLR value thresholds

Re: Clustering options

2016-05-24 Thread Pat Ferrel
Mahout Samsara is more about rolling your own algo, though it has already implemented several as examples. If you want to build your own clustering you will find a lot of what you need in the R-like DSL. But if you want something already built you may want to look at Spark’s MLlib kmeans.

Re: Welcome Trevor Grant as a new Mahout Committer

2016-05-24 Thread Pat Ferrel
Kokanee too? Welcome indeed! On May 24, 2016, at 6:34 AM, Shannon Quinn wrote: Welcome Trevor! On 5/24/16 7:14 AM, Stevo Slavić wrote: > Congratulations Trevor, well deserved, welcome to the team! > > On Tue, May 24, 2016 at 12:32 PM, Suneel Marthi

Re: [Hello] from NASa

2016-05-21 Thread Pat Ferrel
ving components. Users can following some conventions to feed data to Mahout. Steven NASa 2016/05/21 2016-05-21 22:06 GMT+08:00 Pat Ferrel <p...@occamsmachete.com>: > Hi Stephen, > > We have implemented SVD, ALS, and CCO for recommender, but these are only > core algorithms

Re: [DISCUSS] PredictionIO incubation proposal

2016-05-21 Thread Pat Ferrel
h a gallery.​" I agree with the proposers that tracking down a large set of contributors to get their ok for a consolidated grant would be onerous. On Fri, May 20, 2016 at 9:14 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > +1 for the current committer list, but please, anyone inte

Re: [Hello] from NASa

2016-05-21 Thread Pat Ferrel
Hi Stephen, We have implemented SVD, ALS, and CCO for recommender, but these are only core algorithms, not really recommenders as Mahout has done in the past. The reason for this is that there are data prep, data ingestion, and serving components that, in a modern system, must be supplied

Re: Future Mahout - Zeppelin work

2016-05-20 Thread Pat Ferrel
2016 at 10:22 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > Great job Trevor, we’ll need this detail to smooth out the sharp edges and > any guidance from you or the Zeppelin community will be a big help. > > > On May 20, 2016, at 8:13 AM, Shannon Quinn <squ...@gatech.edu&

Re: [DISCUSS] PredictionIO incubation proposal

2016-05-20 Thread Pat Ferrel
+1 for the current committer list, but please, anyone interested get familiar, we will need more help soon! Also I’d like to bring up the template gallery again. Plugins may be problematic in other projects but pio does nothing of interest *without* a template. There are some examples in the

Re: Future Mahout - Zeppelin work

2016-05-20 Thread Pat Ferrel
in Python. >>> >>> The basic deal here is we are: >>> 1) Setting up a standard Zeppelin Spark Interpretter to act like a Mahout >>> interpretter >>> - This is taken care of by setting some env. variables, adding some >>> dependencies, and impor

Re: [DISCUSS] PredictionIO incubation proposal

2016-05-20 Thread Pat Ferrel
It’s great to see such interest and I’m sure the rest of the podling would agree that the more the better. I also agree with Suneel, people who know PIO should be given a short bit of time to get organized before we do the desired expansion. There will be lots of room to contribute, in any

Re: [DISCUSS] PredictionIO incubation proposal

2016-05-17 Thread Pat Ferrel
I’d like to see Apache find a way to sponsor the template gallery. The current site collects data and inclusion is controlled by Salesforce I believe. I guess there is nothing wrong with that but it would be great to have a free open collection of templates as the Apache blessed method of

Re: Future Mahout - Zeppelin work

2016-05-17 Thread Pat Ferrel
intrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Mon, May 16, 2016 at 6:42 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > Creating an mc used to do some Kryo setup, like

Re: Future Mahout - Zeppelin work

2016-05-16 Thread Pat Ferrel
Souds Great. Thank you. From: Trevor Grant <trevor.d.gr...@gmail.com> Sent: Monday, May 16, 2016 5:49:17 PM To: Dmitriy Lyubimov Cc: Andrew Palumbo; Pat Ferrel; Suneel Marthi Subject: Re: Intro - Future Mahout - Zeppelin work I just signed up for dev, should i just reply all and

Re: Read output of sparkrowsimilairty in scala

2016-05-12 Thread Pat Ferrel
There are several ways to do this. The design was meant to be extended by a trait that would do the actual read/write. Check out TDIndexedDatasetReader. You can create a similar trait called MySQLIndexedDatasetReader. There are other examples in that file for reading and writing. Also check the

Re: RowSimilakrity : NotSerializableException

2016-05-07 Thread Pat Ferrel
I think you have to create a SparkDistributedContext, which has Mahout specific Kryo serialization and adds Mahout jars. If you let Mahout create the Spark context it’s simpler val implicit mc = mahoutSparkContext(masterUr = “local", appName = “SparkExample”) As I recall the sc will then

Re: Mahout rowSimilarity

2016-05-04 Thread Pat Ferrel
n-app.html> >>> >>> Let me know if you need more help. >>> >>> Thank you, >>> Nikaash Puri >>>> On 03-May-2016, at 9:49 PM, Rohit Jain <rohitkjai...@gmail.com> wrote: >>>> >>>> Hello Pat, >>>>

Re: Mahout rowSimilarity

2016-05-03 Thread Pat Ferrel
Sure, but at least some would be Scala. There are examples in Mahout that take PairRDDs as input but anything that constructs an IndexedDataset would be fine. I use this code in a system that creates an RDD from HBase. Think of the task as one of how to create a Spark RDD from your DB content.

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-29 Thread Pat Ferrel
a fair amount) and the model was built in about 20 minutes, > which is pretty amazing. This was using a pretty decent sized cluster, > though. > > Thank you, > Nikaash Puri > > On 29-Apr-2016, at 10:18 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > > There ar

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-29 Thread Pat Ferrel
: yes -- i would do it as an optional option -- just like par does -- do nothing; try auto, or try exact number of splits On Fri, Apr 29, 2016 at 9:15 AM, Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> wrote: It’s certainly easy to put this in the driver, taking

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-29 Thread Pat Ferrel
It’s certainly easy to put this in the driver, taking it out of the algo. Dmitriy, is it a candidate for an Option param to the algo? That would catch cases where people rely on it now (like my old DStream example) but easily allow it to be overridden to None to imitate pre 0.11, or passed in

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-28 Thread Pat Ferrel
Hmm, can’t get images through the Apache mail servers. The image is here: https://drive.google.com/file/d/0B4cAk1SMC1ChWFZiRG9DSEpkdzg/view?usp=sharing On Apr 28, 2016, at 11:55 AM, Pat Ferrel <p...@occamsmachete.com> wrote: Actually on your advice Dmitriy I think these changes went in

Re: spark-itemsimilarity runs orders of times slower from Mahout 0.11 onwards

2016-04-27 Thread Pat Ferrel
I have been using the same function through all those versions of Mahout. I’m running on newer versions of Spark 1.4-1.6.2. Using my datasets there has been no slowdown. I assume that you are only changing the Mahout version—leaving data, Spark, HDFS, and all config the same. In which case I

Re: Congratulations to our new Chair

2016-04-21 Thread Pat Ferrel
Congratulations Andy, well deserved. On Apr 21, 2016, at 6:01 AM, Shannon Quinn wrote: Thanks Suneel for your excellent leadership. Congratulations Andrew! On 4/21/16 3:38 AM, Alessandro Negro wrote: > Congratulation! > > Il giorno 21/apr/2016, alle ore 02:36,

<    2   3   4   5   6   7   8   9   10   11   >