date:20170923

Re: Eventserver API in an Engine?

2017-09-23 Thread Pat Ferrel

And glad you did.

The needs of Heroku are just as important as any user of an Apache project *but
no more so* since one extremely important measure of TLP eligibility is to
demonstrate freedom from corporate dominance.

So let me chime in with my own reasons to look at a major refactoring of PIO;
Simplify deployment, one server with integrated engine(s) all incorporated into
a single REST API and a single JVM process (perhaps identical to what Mars is
asking for)
No need to “train” or ‘deploy” on different machines but full access to
clustered compute and storage services (also something Mars mentions)
Kappa, non-Spark-based Engines, pure clean REST API that allows GUIs to be
plugged in, optional true security (SSL+Auth).
The ML/AI community is moving on from Hadoop Mapreduce, to Spark, to TensorFlow
and Streaming online learners (Kappa) and this requires independence from any
specific compute backend.
Multi-tenant, with multiple instances and types of Engines.
Secure, TLS + Authentications + Authorization but optionally done so no
overhead when it isn’t needed.
The CLI is just another client communicating with the server’s REST API and can
be replaced with custom admin GUIs, for example.

We now have an MVP that delivers the above requirements but as a replacement to
PIO. We at first saw this as PIO-Kappa. Early code was named this. But things
have changed since it requires some major re-thinking and so now has its own
name—Harness. To get these features the re-thinking of the PIO codebase will
also be required along with a *lot* of work to implement. We chose to start
from scratch as an easier route. The sever has one JVM process with REST for
all input and query endpoints and even methods to trigger training for Lambda
Engines. We have benchmarked performance on our scaffold Template (minimal
operational Engine) at 6ms/request for one user (connection) in one thread on a
2013 Macbook Pro in localhost mode—add 1 ms for SSL+Auth. Since it uses
akka-http is will also handle a self-tuning number of parallel requests (no
benchmarks yet). So suffice to say it is fast.

Templates for this server are quite a bit different because they now include
their own robust validation mechanism for input, query, and engine.json but
also because Templates must now do some of what pio does. With this
responsibility comes great freedom. Freedom to use any compute backend. Freedom
to use any storage mechanism for model or input. Freedom to be Kappa, Lambda,
or any hybrid between. And Engines get new functionality from the server as
listed in the requirements.

Even though there are structural Template differences they remain JSON input
compatible with PIO. We took a PIO Template we had created in 2016 that uses
Vowpal Wabbit as a compute backend and re-implemented it in this new ML Server
as a clean Kappa Template. Therefore we can talk about the differences with
some evidence to back up statements. There was 0 change to input so backups of
the PIO engine were moved to the new server quite easily with CLI and no change
to data.

There are long tedious discussions that could be made about how to get what
Mars and I are asking for from PIO but Apache is a do-ocracy. All of our asks
can be done incrementally with incremental disruption—or they can be done at
once (and have been). There are so many trade-offs that the discussion will, in
all likelihood never end.

I therefore suggest that Mars *do* what he thinks is needed, or alternatively,
I am willing to donate what we have running. I’m planning to make the UR a
Kappa algorithm soon, requiring no `pio train` (and no Spark). This must, of
necessity be done on the new server framework so whether the new framework
becomes part of PIO 2 or not, is a choice for the team. I suppose I could just
push it to an “experimental” branch but this is something I’m not willing to
*do* without some indication it is welcome.

https://github.com/actionml/harness
https://github.com/actionml/harness/blob/develop/commands.md

https://github.com/actionml/harness/blob/develop/rest_spec.md

Template contract:
https://github.com/actionml/harness/tree/develop/rest-server/core/src/main/scala/com/actionml/core/template

The major downside I will volunteer is that Templates will require a fair bit
of work to port and we have no Spark based ones to use as examples yet. Also we
have not integrated PIO-Stores as the lead-in diagram implies. Remember it is
an MVP running a Template in a production environment but makes no effort to
replicate all PIO features.

On Sep 22, 2017, at 6:35 PM, Mars Hall wrote:

I'm bringing this thread back to life!

There is another thread here this week:
How to training and deploy on different machine?

In it, Pat replies:

You will have to spread the pio “workflow

Re: Unable to connect to all storage backends successfully

2017-09-23 Thread Jim Miller

Hi Donald,

Tried just now and received the following error:

vagrant:~/ $ pio status                                                         
                                             [13:34:52]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.12.0-incubating is installed at 
/home/vagrant/pio/PredictionIO-0.12.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at 
/home/vagrant/pio/PredictionIO-0.12.0-incubating/vendors/spark-2.1.1-bin-hadoop2.7
[INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement of 
1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[ERROR] [Management$] Unable to connect to all storage backends successfully.
The following shows the error message from the storage backend.

error while performing request (java.lang.RuntimeException)

Dumping configuration of initialized storage backend sources.
Please make sure they are correct.

Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOME -> 
/home/vagrant/pio/PredictionIO-0.12.0-incubating/vendors/elasticsearch-5.5.2, 
HOSTS -> localhost, PORTS -> 9300, CLUSTERNAME -> firstCluster, TYPE -> elastic 
search


HERE IS MY PIO-ENV.SH
# PredictionIO Main Configuration
#
# This section controls core behavior of PredictionIO. It is very likely that
# you need to change these to fit your site.

# SPARK_HOME: Apache Spark is a hard dependency and must be configured.
# SPARK_HOME=$PIO_HOME/vendors/spark-2.0.2-bin-hadoop2.7
SPARK_HOME=$PIO_HOME/vendors/spark-2.1.1-bin-hadoop2.7

POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.0.0.jar
MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar

# ES_CONF_DIR: You must configure this if you have advanced configuration for
#              your Elasticsearch setup.
ES_CONF_DIR=$PIO_HOME/vendors/elasticsearch-5.5.2
# HADOOP_CONF_DIR=/opt/hadoop

# HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO
#                 with HBase on a remote cluster.
HBASE_CONF_DIR=$PIO_HOME/hbase-1.3.1/conf

# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

# PredictionIO Storage Configuration
#
# This section controls programs that make use of PredictionIO's built-in
# storage facilities. Default values are shown below.
#
# For more information on storage configuration please refer to
# http://predictionio.incubator.apache.org/system/anotherdatastore/

# Storage Repositories

# Default is to use PostgreSQL
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS

# Storage Data Sources

# PostgreSQL Default Settings
# Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL
# Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
# PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
# PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
# PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio
# PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
# PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio

# MySQL Example
# PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
# PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio
# PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio
# PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio

# Elasticsearch Example
# PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
# PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-5.5.2
# Optional basic HTTP auth
# PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name
# PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret
# Elasticsearch 1.x Example
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=firstCluster
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-5.5.2

# Local File System Example
PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models

# HBase Example
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.3.1

# AWS S3 Example
# PIO_STORAGE_SOURCES_S3_TYPE=s3
# PIO_STORAGE_SOURCES_S3_BUCKET_NAME=pio_bucket
# PIO_STORAGE_SOURCES_S3_BASE_PATH=pio_model

ELASTICSEARCH.YML
# Elasticsearch Configuration =
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make

Re: Eventserver API in an Engine?

Re: Unable to connect to all storage backends successfully

2 matches

Site Navigation

Mail list logo

Footer information