Tjones has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/347058 )

Change subject: Setup tox for running flake8 and pytest
......................................................................


Setup tox for running flake8 and pytest

More plumbing for general python project setup. This allows the tox
command to be used to run all the tests, syntax checking, etc that we
want to run on every commit. Work to get this running as part of jenkins
CI pipeline will be further along. One of the constraints there will be
getting the cdh5.10.0 spark packages installed, but shouldn't be too
difficult.

* Moves the virtualenv for running tox in the vm to /vagrant/venv to keep
the mess in one place. Tried to avoid needing an extra virtualenv, as
tox builds venvs anyways, but tox+pip weren't playing nice and errored
out with the .[test] dep otherwise.
* Switched to debian jessie. Prod is moving that direction, and it's no
harm to switch now before anything complex is setup.
* replace requirements.txt with setup.py
* Add a LICENSE file, it's MIT.
* Adjust the Vagrantfile to use NFS share. With the default share
tox/virtualenv were unable to create hardlinks.

Change-Id: Id57bd5fd0476fc061d4b0a1cd93a1b2f639b7ed4
---
M .gitignore
A LICENSE
A MANIFEST.in
D README
A README.rst
M Vagrantfile
M bootstrap-vm.sh
M mjolnir/test/conftest.py
D requirements.txt
A setup.py
A tox.ini
11 files changed, 159 insertions(+), 54 deletions(-)

Approvals:
  Tjones: Verified; Looks good to me, approved



diff --git a/.gitignore b/.gitignore
index c476043..4b7c536 100644
--- a/.gitignore
+++ b/.gitignore
@@ -4,23 +4,8 @@
 *~
 
 # Distribution / packaging
-.Python
-env/
-bin/
-build/
-develop-eggs/
-dist/
-eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-local/
-include/
-share/
+venv/
 *.egg-info/
-.installed.cfg
 *.egg
 *.log
 
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..d13cc4b
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,19 @@
+The MIT License (MIT)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/MANIFEST.in b/MANIFEST.in
new file mode 100644
index 0000000..7623449
--- /dev/null
+++ b/MANIFEST.in
@@ -0,0 +1 @@
+include LICENESE README.rst
diff --git a/README b/README
deleted file mode 100644
index e0b5a60..0000000
--- a/README
+++ /dev/null
@@ -1,15 +0,0 @@
-== MjoLniR - Machine Learned Ranking for Wikimedia
-
-MjoLniR is a library for handling the backend data processing
-for s Machine Learned Ranking at Wikimedia. It is specialized
-to how click logs are stored at wikimedia and provides functionality
-to transform the source click logs into machine ML models for ranking.
-
-== Requirements
-
-Targets pyspark 1.6.0 running on python 2.7
-
-== Other
-
-Documentation follows the numpy documentation guidelines:
-    https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
diff --git a/README.rst b/README.rst
new file mode 100644
index 0000000..e399627
--- /dev/null
+++ b/README.rst
@@ -0,0 +1,41 @@
+MjoLniR - Machine Learned Ranking for Wikimedia
+===============================================
+
+MjoLniR is a library for handling the backend data processing for Machine
+Learned Ranking at Wikimedia. It is specialized to how click logs are stored at
+Wikimedia and provides functionality to transform the source click logs into ML
+models for ranking in elasticsearch.
+
+Requirements
+============
+
+Targets pyspark from cdh5.10.0. This is mostly pyspark 1.6.0, but has various
+backports integrated. Requires python 2.7, as some dependencies (clickmodels)
+do not support python 3 yet.
+
+Running tests
+=============
+
+Tests can be run from within the provided Vagrant configuration. Use the
+following from the root of this repository to build a vagrant box, ssh into it,
+and run the tests::
+
+    vagrant up
+    vagrant ssh
+    cd /vagrant
+    venv/bin/tox
+
+The test suite includes both flake8 (linter) and pytest (unit) tests. These
+can be run independently with the -e option for tox::
+
+    venv/bin/tox -e flake8
+
+Individual pytest tests can be run by specifying the path on the command line::
+
+    venv/bin/tox -e pytest mjolnir/test/test_sampling.py
+
+Other
+=====
+
+Documentation follows the numpy documentation guidelines:
+    https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt
diff --git a/Vagrantfile b/Vagrantfile
index 2dbb89e..ea4ae99 100644
--- a/Vagrantfile
+++ b/Vagrantfile
@@ -1,14 +1,18 @@
 Vagrant.configure("2") do |config|
 
     config.vm.provider :virtualbox do |vb, override|
-        override.vm.box = "trusty-cloud"
-        override.vm.box_url = 
'https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box'
-        override.vm.box_download_insecure = true
-        override.vm.synced_folder ".", "/vagrant", :mount_options => 
["dmode=777"]
+        override.vm.box = 'debian/contrib-jessie64'
         vb.customize ['modifyvm', :id, '--memory', '2048']
     end
 
-    config.vm.hostname = "MjoLniR"
+    root_share_options = { id: 'vagrant-root' }
+    root_share_options[:type] = :nfs
+    root_share_options[:mount_options] = ['noatime', 'rsize=32767', 
'wsize=3267', 'async']
+    config.nfs.map_uid = Process.uid
+    config.nfs.map_gid = Process.gid
+    config.vm.synced_folder ".", "/vagrant", root_share_options
 
+    config.vm.hostname = "MjoLniR"
+    config.vm.network "private_network", type: "dhcp"
     config.vm.provision "shell", path: "bootstrap-vm.sh"
 end
diff --git a/bootstrap-vm.sh b/bootstrap-vm.sh
index 24e720a..dd8f202 100644
--- a/bootstrap-vm.sh
+++ b/bootstrap-vm.sh
@@ -3,9 +3,9 @@
 set -e
 
 cat >/etc/apt/sources.list.d/cloudera.list <<EOD
-# Packages for Cloudera's Distribution for Hadoop, Version 5.10.0, on Ubuntu 
14.04 amd64       
-deb [arch=amd64] http://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh 
trusty-cdh5.10.0 contrib
-deb-src http://archive.cloudera.com/cdh5/ubuntu/trusty/amd64/cdh 
trusty-cdh5.10.0 contrib
+# Packages for Cloudera's Distribution for Hadoop, Version 5.10.0, on Ubuntu 
14.04 amd64
+deb [arch=amd64] http://archive.cloudera.com/cdh5/debian/jessie/amd64/cdh 
jessie-cdh5.10.0 contrib
+deb-src http://archive.cloudera.com/cdh5/debian/jessie/amd64/cdh 
jessie-cdh5.10.0 contrib
 EOD
 
 cat >/etc/apt/preferences.d/cloudera.pref <<EOD
@@ -23,11 +23,31 @@
     openjdk-7-jre-headless \
     python-virtualenv
 
+# findspark needs a SPARK_HOME to setup pyspark
 cat >/etc/profile.d/spark.sh <<EOD
 SPARK_HOME=/usr/lib/spark
 export SPARK_HOME
 EOD
 
-cd /vagrant
-virtualenv .
-bin/pip install -r requirements.txt
+# pyspark wants to put a metastore_db directory in /vagrant, put it somewhere 
else
+cat >/etc/spark/conf/hive-site.xml <<EOD
+<configuration>
+   <property>
+      <name>hive.metastore.warehouse.dir</name>
+      <value>/tmp/</value>
+      <description>location of default database for the warehouse</description>
+   </property>
+</configuration>
+EOD
+
+# pyspark wants to put a derby.log in /vagrant as well, put it elsewhere
+cat >> /etc/spark/conf/spark-defaults.conf <<EOD
+spark.driver.extraJavaOptions=-Dderby.stream.error.file=/tmp/derby.log
+EOD
+
+if [ ! -d /vagrant/venv ]; then
+    cd /vagrant
+    mkdir venv
+    virtualenv venv
+    venv/bin/pip install tox
+fi
diff --git a/mjolnir/test/conftest.py b/mjolnir/test/conftest.py
index aa93ae6..4074c53 100644
--- a/mjolnir/test/conftest.py
+++ b/mjolnir/test/conftest.py
@@ -1,10 +1,10 @@
 import findspark
-findspark.init()
+findspark.init()  # must happen before importing pyspark
 
-import pytest
-import logging
-from pyspark import SparkContext, SparkConf
-from pyspark.sql import HiveContext
+import pytest  # noqa: E402
+import logging  # noqa: E402
+from pyspark import SparkContext, SparkConf  # noqa: E402
+from pyspark.sql import HiveContext  # noqa: E402
 
 
 def quiet_log4j():
diff --git a/requirements.txt b/requirements.txt
deleted file mode 100644
index a809f40..0000000
--- a/requirements.txt
+++ /dev/null
@@ -1,7 +0,0 @@
-argparse==1.2.1
-clickmodels==1.0.2
-findspark==1.1.0
-py==1.4.33
-py4j==0.10.4
-pytest==3.0.7
-wsgiref==0.1.2
diff --git a/setup.py b/setup.py
new file mode 100644
index 0000000..9b30466
--- /dev/null
+++ b/setup.py
@@ -0,0 +1,41 @@
+import os
+from setuptools import find_packages, setup
+
+
+requirements = [
+    'clickmodels',
+    'py4j',
+]
+
+test_requirements = [
+    'pytest',
+    'findspark',
+    'flake8',
+    'tox',
+]
+
+setup(
+    name='MjoLniR',
+    version='0.0.1',
+    author='Wikimedia Search Team',
+    author_email='discov...@lists.wikimedia.org',
+    description='A plumbing library for Machine Learned Ranking at Wikimedia',
+    license='MIT',
+    packages=find_packages(),
+    include_package_data=True,
+    data_files=['README.rst'],
+    install_requires=requirements,
+    test_requires=test_requirements,
+    extras_require={
+        "test": test_requirements
+    },
+    classifiers=[
+        "Development Status :: 3 - Alpha",
+        "Programming Language :: Python",
+        "Programming Language :: Python :: 2",
+        "Environment :: Other Environment",
+        "Intended Audience :: Developers",
+        "License :: OSI Approved :: MIT License",
+        "Operating System :: OS Independent"
+    ],
+)
diff --git a/tox.ini b/tox.ini
new file mode 100644
index 0000000..61d24ee
--- /dev/null
+++ b/tox.ini
@@ -0,0 +1,16 @@
+[tox]
+minversion = 1.6
+envlist = flake8,pytest
+
+[flake8]
+max-line-length = 120
+
+[testenv:flake8]
+basepython = python2.7
+commands = flake8 mjolnir/
+deps = flake8
+
+[testenv:pytest]
+commands = pytest --pyargs mjolnir
+deps = .[test]
+passenv = SPARK_HOME

-- 
To view, visit https://gerrit.wikimedia.org/r/347058
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id57bd5fd0476fc061d4b0a1cd93a1b2f639b7ed4
Gerrit-PatchSet: 8
Gerrit-Project: search/MjoLniR
Gerrit-Branch: master
Gerrit-Owner: EBernhardson <ebernhard...@wikimedia.org>
Gerrit-Reviewer: DCausse <dcau...@wikimedia.org>
Gerrit-Reviewer: EBernhardson <ebernhard...@wikimedia.org>
Gerrit-Reviewer: Smalyshev <smalys...@wikimedia.org>
Gerrit-Reviewer: Tjones <tjo...@wikimedia.org>
Gerrit-Reviewer: Volans <rcocci...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to