[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16405 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94273331 --- Diff: dev/lint-python --- @@ -19,10 +19,8 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" -PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ ./dev/sparktestsupport" -# TODO: fix pep8 errors with the rest of the Python scripts under dev -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py ./dev/run-tests-jenkins.py" -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py" +# Exclude auto-geneated configuration file. +PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" -not -path "*python/docs/conf.py" )" --- End diff -- I tested this as below for sure, ```bash ./lint-python ./dev/lint-python ./spark/dev/lint-python ``` So, now it is relative paths which currently are up to 11K as below: ``` ./dev/create-release/generate-contributors.py ./dev/create-release/releaseutils.py ./dev/create-release/translate-contributors.py ./dev/github_jira_sync.py ./dev/merge_spark_pr.py ./dev/pep8-1.7.0.py ./dev/pip-sanity-check.py ./dev/run-tests-jenkins.py ./dev/run-tests.py ./dev/sparktestsupport/__init__.py ./dev/sparktestsupport/modules.py ./dev/sparktestsupport/shellutils.py ./dev/sparktestsupport/toposort.py ./examples/src/main/python/als.py ./examples/src/main/python/avro_inputformat.py ./examples/src/main/python/kmeans.py ./examples/src/main/python/logistic_regression.py ./examples/src/main/python/ml/aft_survival_regression.py ./examples/src/main/python/ml/als_example.py ./examples/src/main/python/ml/binarizer_example.py ./examples/src/main/python/ml/bisecting_k_means_example.py ./examples/src/main/python/ml/bucketizer_example.py ./examples/src/main/python/ml/chisq_selector_example.py ./examples/src/main/python/ml/count_vectorizer_example.py ./examples/src/main/python/ml/cross _validator.py ./examples/src/main/python/ml/dataframe_example.py ./examples/src/main/python/ml/dct_example.py ./examples/src/main/python/ml/decision_tree_classification_example.py ./examples/src/main/python/ml/decision_tree_regression_example.py ./examples/src/main/python/ml/elementwise_product_example.py ./examples/src/main/python/ml/estimator_transformer_param_example.py ./examples/src/main/python/ml/gaussian_mixture_example.py ./examples/src/main/python/ml/generalized_linear_regression_example.py ./examples/src/main/python/ml/gradient_boosted_tree_classifier_example.py ./examples/src/main/python/ml/gradient_boosted_tree_regressor_example.py ./examples/src/main/python/ml/index_to_string_example.py ./examples/src/main/python/ml/isotonic_regression_example.py ./examples/src/main/python/ml/kmeans_example.py ./examples/src/main/python/ml/lda_example.py ./examples/src/main/python/ml/linear_regression_with_elastic_net.py ./examples/src/main/python/ml/logistic_regression_summary_example. py ./examples/src/main/python/ml/logistic_regression_with_elastic_net.py ./examples/src/main/python/ml/max_abs_scaler_example.py ./examples/src/main/python/ml/min_max_scaler_example.py ./examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py ./examples/src/main/python/ml/multilayer_perceptron_classification.py ./examples/src/main/python/ml/n_gram_example.py ./examples/src/main/python/ml/naive_bayes_example.py ./examples/src/main/python/ml/normalizer_example.py ./examples/src/main/python/ml/one_vs_rest_example.py ./examples/src/main/python/ml/onehot_encoder_example.py ./examples/src/main/python/ml/pca_example.py ./examples/src/main/python/ml/pipeline_example.py ./examples/src/main/python/ml/polynomial_expansion_example.py ./examples/src/main/python/ml/quantile_discretizer_example.py ./examples/src/main/python/ml/random_forest_classifier_example.py ./examples/src/main/python/ml/random_forest_regressor_example.py ./examples/src/main/python/ml/rformula_example.py ./examples/src/main/python/ml/sql_transformer.py ./examples/src/main/python/ml/standard_scaler_example.py ./examples/src/main/python/ml/stopwords_remover_example.py ./examples/src/main/python/ml/string_indexer_example.py ./examples/src/main/python/ml/tf_idf_example.py ./examples/src/main/python/ml/tokenizer_example.py ./examples/src/main/python/ml/train_validation_split.py ./examples/src/main/python/ml/vector_assembler_example.py ./examples/src/main/python/ml/vector_indexer_example.py ./examples/src/main/python/ml/vector_slicer_example.py ./examples/src/main/python/ml/word2vec_example.py ./examples/src/main/python/mllib/binary_classification_metrics_example.py ./examples/src/main/python/mllib/bisecting_k_means_example.py ./examples/src/main/python/mllib/correlations.py ./examples/src/main/python/mllib/correlations_example.py
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94273247 --- Diff: dev/lint-python --- @@ -19,10 +19,8 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" -PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ ./dev/sparktestsupport" -# TODO: fix pep8 errors with the rest of the Python scripts under dev -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py ./dev/run-tests-jenkins.py" -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py" +# Exclude auto-geneated configuration file. +PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path "*python/docs/conf.py" )" --- End diff -- It seems usually 32K on Cygwin by default in general. The actual length without any prefix seems 11K for now. Let me try to turn these into relative paths as a safe choice. Then, it would be safe in general. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94264468 --- Diff: examples/src/main/python/mllib/decision_tree_regression_example.py --- @@ -44,7 +44,7 @@ # Evaluate model on test instances and compute test error predictions = model.predict(testData.map(lambda x: x.features)) labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions) -testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - p)).sum() /\ +testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] - lp[1])).sum() /\ --- End diff -- Ah ok, makes sense - I was looking at changes directly from pep8 but if we need it to be compiled with python3 to test py3 pep8 that makes sense (of course a follow up issue for proper py3 support is the best place to address the issues not blocking pep8 testing). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94263914 --- Diff: dev/lint-python --- @@ -19,10 +19,8 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" -PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ ./dev/sparktestsupport" -# TODO: fix pep8 errors with the rest of the Python scripts under dev -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py ./dev/run-tests-jenkins.py" -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py" +# Exclude auto-geneated configuration file. +PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path "*python/docs/conf.py" )" --- End diff -- Yea, I think this is a valid point. Let me check the length and the length limitation first for sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94263510 --- Diff: examples/src/main/python/mllib/decision_tree_regression_example.py --- @@ -44,7 +44,7 @@ # Evaluate model on test instances and compute test error predictions = model.predict(testData.map(lambda x: x.features)) labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions) -testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - p)).sum() /\ +testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] - lp[1])).sum() /\ --- End diff -- That seems causing errors in python 3 when a tuple is used in lambda to unpack. It seems http://www.python.org/dev/peps/pep-3113 is related issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94259691 --- Diff: examples/src/main/python/mllib/decision_tree_regression_example.py --- @@ -44,7 +44,7 @@ # Evaluate model on test instances and compute test error predictions = model.predict(testData.map(lambda x: x.features)) labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions) -testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - p)).sum() /\ +testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] - lp[1])).sum() /\ --- End diff -- Why did we get rid of the lambda (v, p) & similar elsewhere? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16405#discussion_r94259548 --- Diff: dev/lint-python --- @@ -19,10 +19,8 @@ SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")" -PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ ./dev/sparktestsupport" -# TODO: fix pep8 errors with the rest of the Python scripts under dev -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py ./dev/run-tests-jenkins.py" -PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py" +# Exclude auto-geneated configuration file. +PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path "*python/docs/conf.py" )" --- End diff -- I'm slightly concerned we might eventually have this be too long to pass in the shell (on Linux in bash ARG_MAX is pretty high but that's not the case everywhere, although we would probably have to double the number of Python files before this started being an issue in Cygwin). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org