This is an automated email from the ASF dual-hosted git repository.
baunsgaard pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/systemds.git
The following commit(s) were added to refs/heads/main by this push:
new 2c90f9e [SYSTEMDS-3217] PythonAPI build release cleanup
2c90f9e is described below
commit 2c90f9e7009d766ed0cb887d4f319c54c322e410
Author: baunsgaard <[email protected]>
AuthorDate: Mon Nov 15 18:46:02 2021 +0100
[SYSTEMDS-3217] PythonAPI build release cleanup
This commit cleanup the python release artifact to
not include:
the generator scripts, tests files, compiled pyc files and tutorial files.
include:
java dependencies cleanly with only the BIN release java files, similar to
our java BIN release for consistency.
The dev version is also changed to fit to pip specification so instead
of being called SNAPSHOT it is now dev.
---
src/main/python/MANIFEST.in | 12 +++++++-
src/main/python/PUBLISH_INSTRUCTIONS.md | 53 --------------------------------
src/main/python/README.md | 35 +++++++++++++++++++++
src/main/python/create_python_dist.py | 6 ++--
src/main/python/post_setup.py | 46 ---------------------------
src/main/python/pre_setup.py | 53 ++++++++++++++++++++++----------
src/main/python/setup.py | 23 +++++++++++---
src/main/python/systemds/project_info.py | 2 +-
8 files changed, 104 insertions(+), 126 deletions(-)
diff --git a/src/main/python/MANIFEST.in b/src/main/python/MANIFEST.in
index 87dfdbd..ba8b013 100644
--- a/src/main/python/MANIFEST.in
+++ b/src/main/python/MANIFEST.in
@@ -21,4 +21,14 @@
include LICENSE
include NOTICE
-recursive-include systemds/systemds-java *
\ No newline at end of file
+
+exclude setup.py
+exclude pre_setup.py
+exclude MANIFEST.in
+exclude README.md
+
+recursive-include systemds *
+
+recursive-exclude **/__pycache__/** *
+recursive-exclude systemds/examples/tutorials/adult *
+recursive-exclude systemds/examples/tutorials/mnist *
diff --git a/src/main/python/PUBLISH_INSTRUCTIONS.md
b/src/main/python/PUBLISH_INSTRUCTIONS.md
deleted file mode 100644
index bc5ce32..0000000
--- a/src/main/python/PUBLISH_INSTRUCTIONS.md
+++ /dev/null
@@ -1,53 +0,0 @@
-<!--
-{% comment %}
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to you under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-{% end comment %}
--->
-
-# Publishing Instructions
-
-## Building SystemDS jar (with dependency jars)
-
-The following steps have to be done for both the cases
-
-- Build SystemDS with maven first `mvn package -P distribution`, with the
working
- directory being `SYSTEMDS_ROOT` (Root directory of SystemDS)
-- `cd` to this folder (basically `SYSTEMDS_ROOT/src/main/python`)
-
-## Building python package
-
-- Run `create_python_dist.py`
-
-```bash
-python3 create_python_dist.py
-```
-
-- now in the `./dist` directory there will exist the source distribution
`systemds-VERSION.tar.gz`
- and the wheel distribution `systemds-VERSION-py3-none-any.whl`, with
`VERSION` being the current version number
-
-## Publishing package
-
-If we want to build the package for uploading to the repository via `python3
-m twine upload dist/*`
- (will be automated in the future)
-
-- Install twine with `pip install --upgrade twine`
-
-- Follow the instructions from the
[Guide](https://packaging.python.org/tutorials/packaging-projects/)
- 1. Create an API-Token in the account (leave the page open or copy the
token, it will only be shown once)
- 2. Execute the command `python3 -m twine upload dist/*`
- - Optional: `pip install keyrings.alt`(use with caution!) if you get
`UserWarning: No recommended backend was available.`
- 3. Username is `__token__`
- 4. Password is the created API-Token **with** `pypi-` prefix
diff --git a/src/main/python/README.md b/src/main/python/README.md
index 6f4ba13..3ab4b2a 100644
--- a/src/main/python/README.md
+++ b/src/main/python/README.md
@@ -36,3 +36,38 @@ as well as distributed operations on Apache Spark. In
contrast to existing syste
provide homogeneous tensors or 2D Datasets - and in order to serve the entire
data science lifecycle, the underlying data model are DataTensors, i.e.,
tensors (multi-dimensional arrays) whose first dimension may have a
heterogeneous and nested schema.
+
+## Publishing Instructions
+
+### Building SystemDS jar (with dependency jars)
+
+The following steps have to be done for both the cases
+
+- Build SystemDS with maven first `mvn package -P distribution`, with the
working
+ directory being `SYSTEMDS_ROOT` (Root directory of SystemDS)
+- `cd` to this folder (basically `SYSTEMDS_ROOT/src/main/python`)
+
+### Building python package
+
+- Run `create_python_dist.py`
+
+```bash
+python3 create_python_dist.py
+```
+
+- now in the `./dist` directory there will exist the source distribution
`systemds-VERSION.tar.gz`
+ and the wheel distribution `systemds-VERSION-py3-none-any.whl`, with
`VERSION` being the current version number
+
+### Publishing package
+
+If we want to build the package for uploading to the repository via `python3
-m twine upload dist/*`
+ (will be automated in the future)
+
+- Install twine with `pip install --upgrade twine`
+
+- Follow the instructions from the
[Guide](https://packaging.python.org/tutorials/packaging-projects/)
+ 1. Create an API-Token in the account (leave the page open or copy the
token, it will only be shown once)
+ 2. Execute the command `python3 -m twine upload dist/*`
+ - Optional: `pip install keyrings.alt`(use with caution!) if you get
`UserWarning: No recommended backend was available.`
+ 3. Username is `__token__`
+ 4. Password is the created API-Token **with** `pypi-` prefix
diff --git a/src/main/python/create_python_dist.py
b/src/main/python/create_python_dist.py
index 3048327..f02578f 100755
--- a/src/main/python/create_python_dist.py
+++ b/src/main/python/create_python_dist.py
@@ -21,8 +21,8 @@
#-------------------------------------------------------------
import subprocess
+
+f = open("generator.log","w")
+subprocess.run("python3 generator/generator.py",shell=True, check=True, stdout
=f, stderr=f)
subprocess.run("python3 pre_setup.py",shell=True, check=True)
-subprocess.run("python3 generator/generator.py",shell=True, check=True)
subprocess.run("python3 setup.py sdist bdist_wheel",shell=True, check=True)
-# post_setup.py moves the files from dist to target which we probably don't
want for uploading them to pypi
-#subprocess.run(["python3", "post_setup.py"]).check_returncode()
diff --git a/src/main/python/post_setup.py b/src/main/python/post_setup.py
deleted file mode 100755
index 7904482..0000000
--- a/src/main/python/post_setup.py
+++ /dev/null
@@ -1,46 +0,0 @@
-#!/usr/bin/env python3
-#-------------------------------------------------------------
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-#
-#-------------------------------------------------------------
-
-from __future__ import print_function
-import os
-import sys
-import platform
-
-try:
- exec(open('systemds/project_info.py').read())
-except IOError:
- print("Could not read project_info.py.", file=sys.stderr)
- sys.exit()
-ARTIFACT_NAME = __project_artifact_id__
-ARTIFACT_VERSION = __project_version__
-ARTIFACT_VERSION_SHORT = ARTIFACT_VERSION.split("-")[0]
-
-root_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.getcwd())))
-src_path_prefix = os.path.join(root_dir, 'src', 'main', 'python', 'dist',
ARTIFACT_NAME + '-' + ARTIFACT_VERSION_SHORT)
-src_path = src_path_prefix + '.zip' if platform.system() == "Windows" and
os.path.exists(
- src_path_prefix + '.zip') else src_path_prefix + '.tar.gz'
-os.rename(
- src_path,
- os.path.join(root_dir, 'target', ARTIFACT_NAME + '-' + ARTIFACT_VERSION +
'-python.tar.gz'))
-wheel_name = '-'.join([ARTIFACT_NAME, ARTIFACT_VERSION_SHORT, 'py3', 'none',
'any.whl'])
-wheel = os.path.join(root_dir, 'src', 'main', 'python', 'dist', wheel_name)
-os.rename(wheel, os.path.join(root_dir, 'target', wheel_name))
diff --git a/src/main/python/pre_setup.py b/src/main/python/pre_setup.py
index c8c0e2e..3368021 100755
--- a/src/main/python/pre_setup.py
+++ b/src/main/python/pre_setup.py
@@ -26,12 +26,10 @@ import fnmatch
from zipfile import ZipFile
this_path = os.path.dirname(os.path.realpath(__file__))
-python_dir = 'systemds'
-java_dir = 'systemds-java'
-java_dir_full_path = os.path.join(this_path, python_dir, java_dir)
-if os.path.exists(java_dir_full_path):
- shutil.rmtree(java_dir_full_path, True)
-root_dir = os.path.dirname(os.path.dirname(os.path.dirname(this_path)))
+PYTHON_DIR = 'systemds'
+
+# Go three directories out this is the root dir of systemds repository
+SYSTEMDS_ROOT = os.path.dirname(os.path.dirname(os.path.dirname(this_path)))
# temporary directory for unzipping of bin zip
TMP_DIR = os.path.join(this_path, 'tmp')
@@ -39,30 +37,51 @@ if os.path.exists(TMP_DIR):
shutil.rmtree(TMP_DIR, True)
os.mkdir(TMP_DIR)
+
+# Copy jar files from release artifact.
+LIB_DIR = os.path.join(this_path, PYTHON_DIR, 'lib')
+if os.path.exists(LIB_DIR):
+ shutil.rmtree(LIB_DIR, True)
SYSTEMDS_BIN = 'systemds-*-bin.zip'
-for file in os.listdir(os.path.join(root_dir, 'target')):
+for file in os.listdir(os.path.join(SYSTEMDS_ROOT, 'target')):
+ # Take jar files from bin release file
if fnmatch.fnmatch(file, SYSTEMDS_BIN):
- new_path = os.path.join(TMP_DIR, file)
- shutil.copyfile(os.path.join(root_dir, 'target', file), new_path)
+ systemds_bin_zip = os.path.join(SYSTEMDS_ROOT, 'target', file)
extract_dir = os.path.join(TMP_DIR)
- with ZipFile(new_path, 'r') as zip:
+
+ with ZipFile(systemds_bin_zip, 'r') as zip:
for f in zip.namelist():
split_path = os.path.split(os.path.dirname(f))
if split_path[1] == 'lib':
zip.extract(f, TMP_DIR)
unzipped_dir_name = file.rsplit('.', 1)[0]
- shutil.copytree(os.path.join(TMP_DIR, unzipped_dir_name),
java_dir_full_path)
- if os.path.exists(TMP_DIR):
- shutil.rmtree(TMP_DIR, True)
+ shutil.copytree(os.path.join(TMP_DIR, unzipped_dir_name, 'lib'),
LIB_DIR)
+
+# Take hadoop binaries.
+HADOOP_DIR_SRC = os.path.join(SYSTEMDS_ROOT, 'target', 'lib', 'hadoop')
+if os.path.exists(HADOOP_DIR_SRC):
+ shutil.copytree(HADOOP_DIR_SRC, os.path.join(LIB_DIR,"hadoop"))
-root_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.getcwd())))
-shutil.copyfile(os.path.join(root_dir, 'LICENSE'), 'LICENSE')
-shutil.copyfile(os.path.join(root_dir, 'NOTICE'), 'NOTICE')
+# Take conf files.
+CONF_DIR = os.path.join(this_path, PYTHON_DIR, 'conf')
+if not os.path.exists(CONF_DIR):
+ os.mkdir(CONF_DIR)
+shutil.copy(os.path.join(SYSTEMDS_ROOT,'conf', 'log4j.properties'),
os.path.join(this_path, PYTHON_DIR, 'conf', 'log4j.properties'))
+shutil.copy(os.path.join(SYSTEMDS_ROOT,'conf',
'SystemDS-config-defaults.xml'), os.path.join(this_path, PYTHON_DIR, 'conf',
'SystemDS-config-defaults.xml'))
-# delete old build and dist path
+SYSTEMDS_ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.getcwd())))
+shutil.copyfile(os.path.join(SYSTEMDS_ROOT, 'LICENSE'), 'LICENSE')
+shutil.copyfile(os.path.join(SYSTEMDS_ROOT, 'NOTICE'), 'NOTICE')
+
+# Remove old build and dist path
+if os.path.exists(TMP_DIR):
+ shutil.rmtree(TMP_DIR, True)
build_path = os.path.join(this_path, 'build')
if os.path.exists(build_path):
shutil.rmtree(build_path, True)
dist_path = os.path.join(this_path, 'dist')
if os.path.exists(dist_path):
shutil.rmtree(dist_path, True)
+egg_path = os.path.join(this_path, 'systemds.egg-info')
+if os.path.exists(egg_path):
+ shutil.rmtree(egg_path, True)
diff --git a/src/main/python/setup.py b/src/main/python/setup.py
index 84412ad..a74de68 100755
--- a/src/main/python/setup.py
+++ b/src/main/python/setup.py
@@ -42,20 +42,33 @@ REQUIRED_PACKAGES = [
'pandas >= 1.2.2'
]
-python_dir = 'systemds'
-java_dir = 'systemds-java'
-java_dir_full_path = python_dir + '/' + java_dir
+LONG_DESCRIPTION= '''"""This package provides a Pythonic interface for working
with SystemDS.
+
+SystemDS is a versatile system for the end-to-end data science lifecycle from
data integration,
+cleaning, and feature engineering, over efficient, local and distributed ML
model training,
+to deployment and serving.
+To facilitate this, bindings from different languages and different system
abstractions provide help for:
+
+1. The different tasks of the data-science lifecycle, and
+2. users with different expertise.
+
+These high-level scripts are compiled into hybrid execution plans of local,
in-memory CPU and GPU operations,
+as well as distributed operations on Apache Spark. In contrast to existing
systems - that either
+provide homogeneous tensors or 2D Datasets - and in order to serve the entire
+data science lifecycle, the underlying data model are DataTensors, i.e.,
+tensors (multi-dimensional arrays) whose first dimension may have a
heterogeneous and nested schema."""'''
setup(
name=ARTIFACT_NAME,
version=ARTIFACT_VERSION,
description='Apache SystemDS - An open source ML system for the end-to-end
data science lifecycle',
- long_description=open('README.md', encoding='utf-8').read(),
+ long_description=LONG_DESCRIPTION,
long_description_content_type='text/markdown',
url='https://github.com/apache/systemds',
author='SystemDS',
author_email='[email protected]',
- packages=find_packages(),
+ # Only include the systemds resources not generator and tests.
+ packages=find_packages(include=["systemds"]),
install_requires=REQUIRED_PACKAGES,
include_package_data=True,
python_requires='>=3.6',
diff --git a/src/main/python/systemds/project_info.py
b/src/main/python/systemds/project_info.py
index 3730af7..faf8b6e 100644
--- a/src/main/python/systemds/project_info.py
+++ b/src/main/python/systemds/project_info.py
@@ -23,4 +23,4 @@
# via string substitutions using the maven-resources-plugin
__project_group_id__ = 'org.apache.systemds'
__project_artifact_id__ = 'systemds'
-__project_version__ = '2.3.0-SNAPSHOT'
+__project_version__ = '2.3.0-dev'