Re: Unsubscribe

2023-12-05 Thread Pat Ferrel
gt; -- Forwarded message - > From: Pat Ferrel mailto:p...@occamsmachete.com>> > Date: Tue, Dec 5, 2023 at 11:58 AM > Subject: Unsubscribe > To: mailto:issues@mahout.apache.org>> > > > Unsubscribe

Unsubscribe

2023-12-05 Thread Pat Ferrel
Unsubscribe

[jira] [Commented] (MAHOUT-2023) Drivers broken, scopt classes not found

2020-10-20 Thread Pat Ferrel (Jira)
[ https://issues.apache.org/jira/browse/MAHOUT-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217820#comment-17217820 ] Pat Ferrel commented on MAHOUT-2023: I don't install Mahout as a shell process. This only occurs

Re: [DISCUSS] Release 14.1, RC7

2020-09-30 Thread Pat Ferrel
Still haven’t had a chance to test since it will take some experimentation to figure out jars needed etc. My test is to replace 0.13 with 0.14.1 Still I see no reason to delay the release for my slow testing +1 From: Andrew Musselman Reply: dev@mahout.apache.org Date: September 28, 2020 at

Re: [DISCUSS] Dissolve Apache PredictionIO PMC and move project to the Attic

2020-08-31 Thread Pat Ferrel
To try to keep this on-subject I’ll say that I’ve been working on what I once saw as a next-gen PIO. It is ASL 2, and has 2 engines that ran in PIO — most notably the Universal Recommender. We offered to make the Harness project part of PIO a couple years back but didn’t get much interest. It

Re: [DISCUSS] Dissolve Apache PredictionIO PMC and move project to the Attic

2020-08-31 Thread Pat Ferrel
To try to keep this on-subject I’ll say that I’ve been working on what I once saw as a next-gen PIO. It is ASL 2, and has 2 engines that ran in PIO — most notably the Universal Recommender. We offered to make the Harness project part of PIO a couple years back but didn’t get much interest. It

Re: [ANNOUNCE] Mahout Con 2020 (A sub-track of ApacheCon @ Home)

2020-08-12 Thread Pat Ferrel
Big fun. Thanks for putting this together. I’ll abuse my few Twitter followers with the announcement. From: Trevor Grant Reply: user@mahout.apache.org Date: August 12, 2020 at 5:59:45 AM To: Mahout Dev List , user@mahout.apache.org Subject:  [ANNOUNCE] Mahout Con 2020 (A sub-track of

Re: [ANNOUNCE] Mahout Con 2020 (A sub-track of ApacheCon @ Home)

2020-08-12 Thread Pat Ferrel
Big fun. Thanks for putting this together. I’ll abuse my few Twitter followers with the announcement. From: Trevor Grant Reply: u...@mahout.apache.org Date: August 12, 2020 at 5:59:45 AM To: Mahout Dev List , u...@mahout.apache.org Subject:  [ANNOUNCE] Mahout Con 2020 (A sub-track of

Memory allocation

2020-04-17 Thread Pat Ferrel
I have used Spark for several years and realize from recent chatter on this list that I don’t really understand how it uses memory. Specifically is spark.executor.memory and spark.driver.memory taken from the JVM heap when does Spark take memory from JVM heap and when it is from off JVM heap.

Re: IDE suitable for Spark

2020-04-07 Thread Pat Ferrel
IntelliJ Scala works well when debugging master=local. Has anyone used it for remote/cluster debugging? I’ve heard it is possible... From: Luiz Camargo Reply: Luiz Camargo Date: April 7, 2020 at 10:26:35 AM To: Dennis Suhari Cc: yeikel valdes , zahidr1...@gmail.com , user@spark.apache.org

Re: PredictionIO ASF Board Report for Mar 2020

2020-03-19 Thread Pat Ferrel
PredictionIO is scalable BY SCALING ITS SUB-SERVICES. Running on a single machine sounds like no scaling has been executed or even planned. How do you scale ANY system? 1) vertical scaling: make the instance larger with more cores, more disk, and most importantly more memory. Increase whatever

Re: Livy on Kubernetes support

2020-01-14 Thread Pat Ferrel
+1 from another user fwiw. We also have livy containers and helm charts. The real problem is deploying a Spark Cluster in k8s. We know of no working images for this. The Spark team seems focused on deploying Jobs with k8s, which is fine but is not enough. We need to deploy Spark itself.  We

Re: Possible missing mentor(s)

2019-09-01 Thread Pat Ferrel
Seems like some action should be taken before 2 years, even if it is to close the PR because it is not appropriate. Isn’t this the responsibility of the chair to guard against committer changes where the contributor is still willing? Or if a mentor is guiding the PR they should help it get

Re: k8s orchestrating Spark service

2019-07-03 Thread Pat Ferrel
todays question. From: Matt Cheah Reply: Matt Cheah Date: July 1, 2019 at 5:14:05 PM To: Pat Ferrel , user@spark.apache.org Subject: Re: k8s orchestrating Spark service > We’d like to deploy Spark Workers/Executors and Master (whatever master is easiest to talk about since we really do

Re: JAVA_HOME is not set

2019-07-03 Thread Pat Ferrel
Oops, should have said: "I may have missed something but I don’t recall PIO being released by Apache as an ASF maintained container/image release artifact." From: Pat Ferrel Reply: user@predictionio.apache.org Date: July 3, 2019 at 11:16:43 AM To: Wei Chen , d...@predictionio.

Re: JAVA_HOME is not set

2019-07-03 Thread Pat Ferrel
Oops, should have said: "I may have missed something but I don’t recall PIO being released by Apache as an ASF maintained container/image release artifact." From: Pat Ferrel Reply: u...@predictionio.apache.org Date: July 3, 2019 at 11:16:43 AM To: Wei Chen , dev@predictionio.apach

Re: JAVA_HOME is not set

2019-07-03 Thread Pat Ferrel
BTW the container you use is supported by the container author, if at all. I may have missed something but I don’t recall PIO being released by Apache as an ASF maintained release artifact. I wish ASF projects would publish Docker Images made for real system integration, but IIRC PIO does not.

Re: JAVA_HOME is not set

2019-07-03 Thread Pat Ferrel
BTW the container you use is supported by the container author, if at all. I may have missed something but I don’t recall PIO being released by Apache as an ASF maintained release artifact. I wish ASF projects would publish Docker Images made for real system integration, but IIRC PIO does not.

Re: k8s orchestrating Spark service

2019-07-01 Thread Pat Ferrel
run our Driver and Executors considering that the Driver is part of the Server process? Maybe we are talking past each other with some mistaken assumptions (on my part perhaps). From: Pat Ferrel Reply: Pat Ferrel Date: July 1, 2019 at 4:57:20 PM To: user@spark.apache.org , Matt Cheah

Re: k8s orchestrating Spark service

2019-07-01 Thread Pat Ferrel
anyone have something they like? From: Matt Cheah Reply: Matt Cheah Date: July 1, 2019 at 4:45:55 PM To: Pat Ferrel , user@spark.apache.org Subject: Re: k8s orchestrating Spark service Sorry, I don’t quite follow – why use the Spark standalone cluster as an in-between layer when one can just

Re: k8s orchestrating Spark service

2019-07-01 Thread Pat Ferrel
of services including Spark. The rest work, we are asking if anyone has seen a good starting point for adding Spark as a k8s managed service. From: Matt Cheah Reply: Matt Cheah Date: July 1, 2019 at 3:26:20 PM To: Pat Ferrel , user@spark.apache.org Subject: Re: k8s orchestrating Spark service

k8s orchestrating Spark service

2019-06-30 Thread Pat Ferrel
We're trying to setup a system that includes Spark. The rest of the services have good Docker containers and Helm charts to start from. Spark on the other hand is proving difficult. We forked a container and have tried to create our own chart but are having several problems with this. So back to

Re: run new spark version on old spark cluster ?

2019-05-20 Thread Pat Ferrel
It is always dangerous to run a NEWER version of code on an OLDER cluster. The danger increases with the semver change and this one is not just a build #. In other word 2.4 is considered to be a fairly major change from 2.3. Not much else can be said. From: Nicolas Paris Reply:

Fwd: Spark Architecture, Drivers, & Executors

2019-05-17 Thread Pat Ferrel
In order to create an application that executes code on Spark we have a long lived process. It periodically runs jobs programmatically on a Spark cluster, meaning it does not use spark-submit. The Jobs it executes have varying requirements for memory so we want to have the Spark Driver run in the

Re: Spark structured streaming watermarks on nested attributes

2019-05-06 Thread Pat Ferrel
Streams have no end until watermarked or closed. Joins need bounded datasets, et voila. Something tells me you should consider the streaming nature of your data and whether your joins need to use increments/snippets of infinite streams or to re-join the entire contents of the streams accumulated

Re: Deep Learning with Spark, what is your experience?

2019-05-04 Thread Pat Ferrel
@Riccardo Spark does not do the DL learning part of the pipeline (afaik) so it is limited to data ingestion and transforms (ETL). It therefore is optional and other ETL options might be better for you. Most of the technologies @Gourav mentions have their own scaling based on their own compute

Livy with Standalone Spark Master

2019-04-20 Thread Pat Ferrel
Does Livy work with a Standalone Spark Master?

Re: new install help

2019-04-15 Thread Pat Ferrel
Most people running on a Windows machine use a VM running Linux. You will run into constant issues if you go down another road with something like cygwin, so avoid the headache. From: Steve Pruitt Reply: user@predictionio.apache.org Date: April 15, 2019 at 10:59:09 AM To:

Why not a Top Level Project?

2019-04-08 Thread Pat Ferrel
To slightly over simplify, all it takes to be a TLP for Apache is: 1) clear community support 2) a couple Apache members to sponsor (Incubator members help) 3) demonstrated processes that follow the Apache way 4) the will of committers and PMC to move to TLP What is missing in Livy? I am

Re: spark.submit.deployMode: cluster

2019-03-28 Thread Pat Ferrel
Thanks, are you referring to https://github.com/spark-jobserver/spark-jobserver or the undocumented REST job server included in Spark? From: Jason Nerothin Reply: Jason Nerothin Date: March 28, 2019 at 2:53:05 PM To: Pat Ferrel Cc: Felix Cheung , Marcelo Vanzin , user Subject: Re

Re: spark.submit.deployMode: cluster

2019-03-28 Thread Pat Ferrel
;-) Great idea. Can you suggest a project? Apache PredictionIO uses spark-submit (very ugly) and Apache Mahout only launches trivially in test apps since most uses are as a lib. From: Felix Cheung Reply: Felix Cheung Date: March 28, 2019 at 9:42:31 AM To: Pat Ferrel , Marcelo Vanzin Cc

Re: Where does the Driver run?

2019-03-28 Thread Pat Ferrel
e mode you might be able to use this: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/Client.scala Lastly, you can always check where Spark processes run by executing ps on the machine, i.e. `ps aux | grep java`. Best, Jianneng *From:* Pat Ferrel *Dat

Re: spark.submit.deployMode: cluster

2019-03-26 Thread Pat Ferrel
Reply: Marcelo Vanzin Date: March 26, 2019 at 1:59:36 PM To: Pat Ferrel Cc: user Subject: Re: spark.submit.deployMode: cluster If you're not using spark-submit, then that option does nothing. If by "context creation API" you mean "new SparkContext()" or an equ

spark.submit.deployMode: cluster

2019-03-26 Thread Pat Ferrel
I have a server that starts a Spark job using the context creation API. It DOES NOY use spark-submit. I set spark.submit.deployMode = “cluster” In the GUI I see 2 workers with 2 executors. The link for running application “name” goes back to my server, the machine that launched the job. This is

Re: Where does the Driver run?

2019-03-25 Thread Pat Ferrel
:07 AM To: Pat Ferrel Cc: Akhil Das , user Subject: Re: Where does the Driver run? Hi Pat, Indeed, I don't think that it's possible to use cluster mode w/o spark-submit. All the docs I see appear to always describe needing to use spark-submit for cluster mode -- it's not even compatible

Re: Where does the Driver run?

2019-03-25 Thread Pat Ferrel
only guessing at that). Further; if we don’t use spark-submit we can’t use deployMode = cluster ??? From: Akhil Das Reply: Akhil Das Date: March 24, 2019 at 7:45:07 PM To: Pat Ferrel Cc: user Subject: Re: Where does the Driver run? There's also a driver ui (usually available on port 4040

Re: Where does the Driver run?

2019-03-24 Thread Pat Ferrel
60g BTW I would expect this to create one Executor, one Driver, and the Master on 2 Workers. From: Andrew Melo Reply: Andrew Melo Date: March 24, 2019 at 12:46:35 PM To: Pat Ferrel Cc: Akhil Das , user Subject: Re: Where does the Driver run? Hi Pat, On Sun, Mar 24, 2019 at 1:03 PM

Re: Where does the Driver run?

2019-03-24 Thread Pat Ferrel
60g From: Andrew Melo Reply: Andrew Melo Date: March 24, 2019 at 12:46:35 PM To: Pat Ferrel Cc: Akhil Das , user Subject: Re: Where does the Driver run? Hi Pat, On Sun, Mar 24, 2019 at 1:03 PM Pat Ferrel wrote: > Thanks, I have seen this many times in my research. Paraphrasi

Re: Where does the Driver run?

2019-03-24 Thread Pat Ferrel
: Akhil Das Date: March 23, 2019 at 9:26:50 PM To: Pat Ferrel Cc: user Subject: Re: Where does the Driver run? If you are starting your "my-app" on your local machine, that's where the driver is running. [image: image.png] Hope this helps. <https://spark.apache.org/docs/l

Where does the Driver run?

2019-03-23 Thread Pat Ferrel
I have researched this for a significant amount of time and find answers that seem to be for a slightly different question than mine. The Spark 2.3.3 cluster is running fine. I see the GUI on “ http://master-address:8080;, there are 2 idle workers, as configured. I have a Scala application that

Re: Spark with Kubernetes connecting to pod ID, not address

2019-02-13 Thread Pat Ferrel
Executor.java:163) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurren

Spark with Kubernetes connecting to pod id, not address

2019-02-12 Thread Pat Ferrel
From: Pat Ferrel Reply: Pat Ferrel Date: February 12, 2019 at 5:40:41 PM To: user@spark.apache.org Subject:  Spark with Kubernetes connecting to pod id, not address We have a k8s deployment of several services including Apache Spark. All services seem to be operational. Our application

Re: [NOTICE] Mandatory migration of git repositories to gitbox.apache.org

2019-01-03 Thread Pat Ferrel
+1 From: Apache Mahout Reply: dev@mahout.apache.org Date: January 3, 2019 at 11:53:02 AM To: dev Subject:  Re: [NOTICE] Mandatory migration of git repositories to gitbox.apache.org  On Thu, 3 Jan 2019 13:51:40 -0600, dev wrote: Cool, just making sure we needed it. On Thu, Jan 3,

Re: universal recommender version

2018-11-27 Thread Pat Ferrel
There is a tag v0.7.3 and yes it is in master: https://github.com/actionml/universal-recommender/tree/v0.7.3 From: Marco Goldin Reply: user@predictionio.apache.org Date: November 20, 2018 at 6:56:39 AM To: user@predictionio.apache.org , gyar...@griddynamics.com Subject:  Re: universal

[jira] [Commented] (PIO-31) Move from spray to akka-http in servers

2018-09-19 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/PIO-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621051#comment-16621051 ] Pat Ferrel commented on PIO-31: --- I assume we are talking about the Event Server and the query server both

Re: PIO train issue

2018-08-29 Thread Pat Ferrel
Assuming your are using the UR… I don’t know how many times this has been caused by a misspelling of eventNames in engine.json but assume you have checked that. The fail-safe way to check is to `pio export` your data and check it against your engine.json. BTW `pio status` does not even try to

Re: Distinct recommendation from "random" backfill?

2018-08-28 Thread Pat Ferrel
The random ranking is assigned after every `pio train` so if you have not trained in-between, they will be the same. Random is not really meant to do what you are using it for, it is meant to surface items with no data—no primary events. This will allow some to get real events and be recommended

Why are these going to the incubator address?

2018-08-24 Thread Pat Ferrel
Is it necessary these commits are going to the incubator list? Are notifications setup wrong? From: git-site-r...@apache.org Reply: dev@predictionio.apache.org Date: August 24, 2018 at 10:33:34 AM To: comm...@predictionio.incubator.apache.org Subject: [7/7] predictionio-site git commit:

Re: PredictionIO spark deployment in Production

2018-08-07 Thread Pat Ferrel
Oh and no it does not need a new context for every query, only for the deploy. From: Pat Ferrel Date: August 7, 2018 at 10:00:49 AM To: Ulavapalle Meghamala Cc: user@predictionio.apache.org , actionml-user Subject: Re: PredictionIO spark deployment in Production The answers to your

Re: PredictionIO spark deployment in Production

2018-08-07 Thread Pat Ferrel
into Elasticsearch for serving independently scalable queries. I always advise you keep Spark out of serving for the reasons mentioned above. From: Ulavapalle Meghamala Date: August 7, 2018 at 9:27:46 AM To: Pat Ferrel Cc: user@predictionio.apache.org , actionml-user Subject: Re

Re: PredictionIO spark deployment in Production

2018-08-07 Thread Pat Ferrel
PIO is designed to use Spark in train and deploy. But the Universal Recommender removes the need for Spark to make predictions. This IMO is a key to use Spark well—remove it from serving results. PIO creates a Spark context to launch the `pio deploy' driver but Spark is never used and the context

Re: Straw poll: deprecating Scala 2.10 and Spark 1.x support

2018-08-02 Thread Pat Ferrel
+1 From: takako shimamoto Reply: user@predictionio.apache.org Date: August 2, 2018 at 2:55:49 AM To: d...@predictionio.apache.org , user@predictionio.apache.org Subject: Straw poll: deprecating Scala 2.10 and Spark 1.x support Hi all, We're considering deprecating Scala 2.10 and Spark

Re: 2 pio servers with 1 event server

2018-08-02 Thread Pat Ferrel
What template? From: Sami Serbey Reply: user@predictionio.apache.org Date: August 2, 2018 at 9:08:05 AM To: user@predictionio.apache.org Subject: 2 pio servers with 1 event server Greetings, I am trying to run 2 pio servers on different ports where each server have his own app. When I

Re: [actionml/universal-recommender] Boosting categories only shows one category type (#55)

2018-07-06 Thread Pat Ferrel
Please read the docs. There is no need to $set users since they are attached to usage events and can be detected automatically. In fact "$set"ting them is ignored. There are no properties of users that are not calculated based on named “indicators’, which can be profile type things. Fot this

Re: Digging into UR algorithm

2018-07-02 Thread Pat Ferrel
-id, "searched-for”, search-term) This as a secondary event has proven to be quite useful in at least one dataset I’ve seen. From: Pat Ferrel Reply: Pat Ferrel Date: July 2, 2018 at 12:18:16 PM To: user@predictionio.apache.org , Sami Serbey Cc: actionml-user Subject: Re: Digging in

Re: Digging into UR algorithm

2018-07-02 Thread Pat Ferrel
The only requirement is that someone performed the primary event on A and the secondary event is correlated to that primary event. the UR can recommend to a user who has only performed the secondary event on B as long as that is in the model. Makes no difference what subset of events the user has

[jira] [Updated] (MAHOUT-2048) There are duplicate content pages which need redirects instead

2018-06-27 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel updated MAHOUT-2048: --- Sprint: 0.14.0 Release > There are duplicate content pages which need redirects inst

[jira] [Created] (MAHOUT-2048) There are duplicate content pages which need redirects instead

2018-06-27 Thread Pat Ferrel (JIRA)
Pat Ferrel created MAHOUT-2048: -- Summary: There are duplicate content pages which need redirects instead Key: MAHOUT-2048 URL: https://issues.apache.org/jira/browse/MAHOUT-2048 Project: Mahout

Re: a question about a high availability of Elasticsearch cluster

2018-06-22 Thread Pat Ferrel
This should work with any node down. Elasticsearch should elect a new master. What version of PIO are you using? PIO and the UR changed the client from the transport client to the RET client in 0.12.0, which is why you are using port 9200. Do all PIO functions work correctly like: - pio app

Re: UR trending ranking as separate process

2018-06-20 Thread Pat Ferrel
 user@predictionio.apache.org Date: June 20, 2018 at 10:25:53 AM To: user@predictionio.apache.org , Pat Ferrel Cc: user@predictionio.apache.org Subject:  Re: UR trending ranking as separate process Hi George, I didn't get your question but I think I am missing something. So you're using the Univ

Re: UR trending ranking as separate process

2018-06-20 Thread Pat Ferrel
No the trending algorithm is meant to look at something like trends over 2 days. This is because it looks at 2 buckets of conversion frequencies and if you cut them smaller than a day you will have so much bias due to daily variations that the trends will be invalid. In other words the ups and

Re: java.util.NoSuchElementException: head of empty list when running train

2018-06-19 Thread Pat Ferrel
PIO_STORAGE_SOURCES_HBASE_HOME=/usr/local/hbase Thanks, Anuj Kumar On Tue, Jun 19, 2018 at 9:16 PM Pat Ferrel wrote: > Can you show me where on the AML site it says to store models in HDFS, it > should not say that? I think that may be from the PIO site so you should > ignore it. > > Can

Re: java.util.NoSuchElementException: head of empty list when running train

2018-06-19 Thread Pat Ferrel
; based backfill, must add eventsNames", > > "name": "ur", > > "params": { > > "appName": "np", > > "indexName": "np", > > "typeName": "items", > > "blacklistEvents": [],

Re: Few Queries Regarding the Recommendation Template

2018-06-13 Thread Pat Ferrel
te gets it wrong. From: KRISH MEHTA Reply: KRISH MEHTA Date: June 13, 2018 at 2:19:17 PM To: Pat Ferrel Subject: Re: Few Queries Regarding the Recommendation Template I Understand but if I just want the likes, dislikes and views then I can combine the algorithms right? Given in

Re: True Negative - ROC Curve

2018-06-12 Thread Pat Ferrel
We do not use these for recommenders. The precision rate is low when the lift in your KPI like sales is relatively high. This is not like classification. We use MAP@k with increasing values of k. This should yield a diminishing mean average precision chart with increasing k. This tells you 2

Re: Regarding Real-Time Prediction

2018-06-11 Thread Pat Ferrel
Actually if you are using the Universal Recommender you only need to deploy once as long as the engine.json does not change. The hot swap happens as @Digambar says and there is literally no downtime. If you are using any of the other recommenders you do have to re-deploy after every train but

Re: UR template minimum event number to recommend

2018-06-04 Thread Pat Ferrel
No but we have 2 ways to handle this situation automatically and you can tell if recommendations are not from personal user history. 1. when there is not enough user history to recommend, we fill in the lower ranking recommendations with popular, trending, or hot items. Not completely

Re: PIO 0.12.1 with HDP Spark on YARN

2018-05-29 Thread Pat Ferrel
Yarn has to be started explicitly. Usually it is part of Hadoop and is started with Hadoop. Spark only contains the client for Yarn (afaik). From: Miller, Clifford Reply: user@predictionio.apache.org Date: May 29, 2018 at 6:45:43 PM To: user@predictionio.apache.org Subject: Re: PIO

Re: Spark cluster error

2018-05-29 Thread Pat Ferrel
Sorry, what I meant was the actual spark-submit command that PIO was using. It should be in the log. What Spark version was that? I recall classpath issues with certain versions of Spark. On Thu, May 24, 2018 at 4:52 PM, Pat Ferrel wrote: > Thanks Donald, > > We have: > >

Re: pio app new failed in hbase

2018-05-29 Thread Pat Ferrel
No, this is as expected. When you run pseudo-distributed everything internally is configured as if the services were on separate machines. See clustered instructions here: http://actionml.com/docs/small_ha_cluster This is to setup 3 machines running different parts and is not really the best

Re: PIO not using HBase cluster

2018-05-25 Thread Pat Ferrel
rd.mil...@phoenix-opsgroup.com> Reply: Miller, Clifford <clifford.mil...@phoenix-opsgroup.com> <clifford.mil...@phoenix-opsgroup.com> Date: May 25, 2018 at 10:16:01 AM To: Pat Ferrel <p...@occamsmachete.com> <p...@occamsmachete.com> Cc: user@predictionio.apache.org <user@predict

Re: PIO not using HBase cluster

2018-05-25 Thread Pat Ferrel
No, you need to have HBase installed, or at least the config installed on the PIO machine. The pio-env.sh defined servers will be configured cluster operations and will be started separately from PIO. PIO then will not start hbase and try to sommunicate only, not start it. But PIO still needs

Re: Spark2 with YARN

2018-05-24 Thread Pat Ferrel
I’m having a java.lang.NoClassDefFoundError in a different context and different class. Have you tried this without Yarn? Sorry I can’t find the rest of this thread. From: Miller, Clifford Reply:

Re: Spark cluster error

2018-05-24 Thread Pat Ferrel
doop2 in the storage driver assembly. Looking at Git history it has not changed in a while. Do you have the exact classpath that has gone into your Spark cluster? On Wed, May 23, 2018 at 1:30 PM, Pat Ferrel <p...@actionml.com> wrote: > A source build did not fix the problem, has anyone r

Re: Spark cluster error

2018-05-23 Thread Pat Ferrel
ster=local but not with remote Spark master I’ve passed in the hbase-client in the --jars part of spark-submit, still fails, what am I missing? From: Pat Ferrel <p...@actionml.com> <p...@actionml.com> Reply: Pat Ferrel <p...@actionml.com> <p...@actionml.com> Date: May 23, 2018 at 8:57:32 A

Spark cluster error

2018-05-23 Thread Pat Ferrel
Same CLI works using local Spark master, but fails using remote master for a cluster due to a missing class def for protobuf used in hbase. We are using the binary dist 0.12.1. Is this known? Is there a work around? We are now trying a source build in hope the class will be put in the assembly

RE: Problem with training in yarn cluster

2018-05-23 Thread Pat Ferrel
e case where yarn is tyring to findout pio.log file on hdfs cluster. You can try "--master yarn --deploy-mode client ". you need to pass this configuration with pio train e.g., pio train -- --master yarn --deploy-mode client Thanks and Regards Ambuj Sharma Sunrise may late, B

RE: Problem with training in yarn cluster

2018-05-22 Thread Pat Ferrel
arbitrary Spark params exactly as you would to spark-submit on the pio command line. The double dash separates PIO and Spark params. From: Pat Ferrel <p...@occamsmachete.com> <p...@occamsmachete.com> Reply: user@predictionio.apache.org <user@predictionio.apache.org> <user@pr

RE: Problem with training in yarn cluster

2018-05-22 Thread Pat Ferrel
What is the command line for `pio train …` Specifically are you using yarn-cluster mode? This causes the driver code, which is a PIO process, to be executed on an executor. Special setup is required for this. From: Wojciech Kowalski Reply: user@predictionio.apache.org

Re: UR: build/train/deploy once & querying for 3 use cases

2018-05-11 Thread Pat Ferrel
BTW The Universal Recommender has it’s own community support group here: https://groups.google.com/forum/#!forum/actionml-user From: Pat Ferrel <p...@occamsmachete.com> <p...@occamsmachete.com> Reply: user@predictionio.apache.org <user@predictionio.apache.org> <user@predicti

Re: UR: build/train/deploy once & querying for 3 use cases

2018-05-11 Thread Pat Ferrel
and “ItemBias” on the query > do not have any effect on the result. > > 5.Is it feasible to build/train/deploy only once, and query for > all 3 use cases? > > > 6. How to make queries towards the different Apps because there is > no any obvious way in the query para

Re: UR evaluation

2018-05-10 Thread Pat Ferrel
Exactly, ranking is the only task of a recommender. Precision is not automatically good at that but something like MAP@k is. From: Marco Goldin <markomar...@gmail.com> <markomar...@gmail.com> Date: May 10, 2018 at 10:09:22 PM To: Pat Ferrel <p...@occamsmachete.com> <p...@

Re: UR evaluation

2018-05-10 Thread Pat Ferrel
3 AM To: Pat Ferrel <p...@occamsmachete.com> Cc: user@predictionio.apache.org <user@predictionio.apache.org> Subject:  Re: UR evaluation thank you very much, i didn't see this tool, i'll definitely try it. Clearly better to have such a specific instrument. 2018-05-10 18:36

Re: UR evaluation

2018-05-10 Thread Pat Ferrel
You can if you want but we have external tools for the UR that are much more flexible. The UR has tuning that can’t really be covered by the built in API. https://github.com/actionml/ur-analysis-tools They do MAP@k as well as creating a bunch of other metrics and comparing different types of input

Re: UR: build/train/deploy once & querying for 3 use cases

2018-05-09 Thread Pat Ferrel
Why do you want to throw away user behavior in making recommendations? The lift you get in purchases will be less. There is a use case for this when you are making recommendations basically inside a session where the user is browsing/viewing things on a hunt for something. In this case you would

Users of Scala 2.11

2018-04-24 Thread Pat Ferrel
Hi all, Mahout has hit a bit of a bump in releasing a Scala 2.11 version. I was able to build 0.13.0 for Scala 2.11 and have published it on github as a Maven compatible repo. I’m also using it from SBT. If anyone wants access let me know.

Users of Scala 2.11

2018-04-24 Thread Pat Ferrel
Hi all, Mahout has hit a bit of a bump in releasing a Scala 2.11 version. I was able to build 0.13.0 for Scala 2.11 and have published it on github as a Maven compatible repo. I’m also using it from SBT. If anyone wants access let me know.

Re: Info / resources for scaling PIO?

2018-04-24 Thread Pat Ferrel
PIO is based on the architecture of Spark, which uses HDFS. HBase also uses HDFS. Scaling these are quite well documented on the web. Scaling PIO is the same as scaling all it’s services. It is unlikely you’ll need it but you can also have more than one PIO server behind a load balancer. Don’t

Re: pio deploy without spark context

2018-04-14 Thread Pat Ferrel
The need for Spark at query time depends on the engine. Which are you using? The Universal Recommender, which I maintain, does not require Spark for queries but uses PIO. We simply don’t use the Spark context so it is ignored. To make PIO work you need to have the Spark code accessible but that

Re: Hbase issue

2018-04-13 Thread Pat Ferrel
This may seem unhelpful now but for others it might be useful to mention some minimum PIO in production best practices: 1) PIO should IMO never be run in production on a single node. When all services share the same memory, cpu, and disk, it is very difficult to find the root cause to a

Re: how to set engine-variant in intellij idea

2018-04-10 Thread Pat Ferrel
There are instructions for using Intellij but, I wrote the last version, I apologize that I can’t make them work anymore. If you get them to work you would be doing the community a great service by telling us how or editing the instructions. http://predictionio.apache.org/resources/intellij/

Re: Unclear problem with using S3 as a storage data source

2018-03-29 Thread Pat Ferrel
: user@predictionio.apache.org <user@predictionio.apache.org> Date: March 29, 2018 at 6:19:58 AM To: Pat Ferrel <p...@occamsmachete.com> Cc: user@predictionio.apache.org <user@predictionio.apache.org> Subject:  Re: Unclear problem with using S3 as a storage data source Sorry

Re: Unclear problem with using S3 as a storage data source

2018-03-28 Thread Pat Ferrel
: Dave Novelli <d...@ultravioletanalytics.com> <d...@ultravioletanalytics.com> Date: March 28, 2018 at 12:13:12 PM To: Pat Ferrel <p...@occamsmachete.com> <p...@occamsmachete.com> Cc: user@predictionio.apache.org <user@predictionio.apache.org> <user@predictionio.

Re: Error when training The Universal Recommender 0.7.0 with PredictionIO 0.12.0-incubating

2018-03-27 Thread Pat Ferrel
Pio build requires that ES hosts are known to Spark, which write the model to ES. You can pass these in on the `pio train` command line: pio train … -- --conf spark.es.nodes=“node1,node2,node3” notice no spaces in the quoted list of hosts, also notice the double dash, which separates pio

Re: UR 0.7.0 - problem with training

2018-03-08 Thread Pat Ferrel
BTW I think you may have to push setting on the cli by adding “spark” to the beginning of the key name: *pio train -- --conf spark.es.nodes=**“**localhost" --driver-memory 8g --executor-memory 8g* From: Pat Ferrel <p...@occamsmachete.com> <p...@occamsmachete.co

Re: UR 0.7.0 - problem with training

2018-03-08 Thread Pat Ferrel
es.nodes is supposed to be a string with hostnames separated by commas. Depending on how your containers are set to communicate with the outside world (Docker networking or port mapping) you may also need to set the port, which is 9200 by default. If your container is using port mapping and maps

Re: Spark 2.x/scala 2.11.x release

2018-03-03 Thread Pat Ferrel
evor Grant <trevor.d.gr...@gmail.com> > Sent: Friday, March 2, 2018 5:15:35 PM > To: Mahout Dev List > Subject: Re: Spark 2.x/scala 2.11.x release > > The only "mess" is in the cli spark drivers, namely scopt. > > Get rid of the drivers/fix the scopt issue- we

Re: Spark 2.x/scala 2.11.x release

2018-03-02 Thread Pat Ferrel
e cli spark drivers, namely scopt. > > Get rid of the drivers/fix the scopt issue- we have no mess. > > > > On Mar 2, 2018 4:09 PM, "Pat Ferrel" <p...@occamsmachete.com> wrote: > > > BTW the mess master is in is why git flow was invented and why I asked &

Re: Spark 2.x/scala 2.11.x release

2018-03-02 Thread Pat Ferrel
t; > - Cherrypick any commits that we'd like to release (E.g.: SparseSpeedup) > onto `develop` (along with a PR ad a ticket). > > > - Merge `develop` to `master`, run through Smoke tests, tag master @ > `mahout-0.13.1`(automatically), and release. > > > This will also ge

Re: Spark 2.x/scala 2.11.x release

2018-03-02 Thread Pat Ferrel
r`, run through Smoke tests, tag master @ > `mahout-0.13.1`(automatically), and release. > > > This will also get us to more of a git-flow workflow, as we've discussed > moving towards. > > > Thoughts @all? > > > --andy > > > > > > > _

  1   2   3   4   5   6   7   8   9   10   >