[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2017-01-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16405


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94273331
  
--- Diff: dev/lint-python ---
@@ -19,10 +19,8 @@
 
 SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
 SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
-PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ 
./dev/sparktestsupport"
-# TODO: fix pep8 errors with the rest of the Python scripts under dev
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py 
./dev/run-tests-jenkins.py"
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py"
+# Exclude auto-geneated configuration file.
+PATHS_TO_CHECK="$( cd "$SPARK_ROOT_DIR" && find . -name "*.py" -not -path 
"*python/docs/conf.py" )"
--- End diff --

I tested this as below for sure,

```bash
./lint-python
./dev/lint-python
./spark/dev/lint-python
```

So, now it is relative paths which currently are up to 11K as below:

```
./dev/create-release/generate-contributors.py 
./dev/create-release/releaseutils.py 
./dev/create-release/translate-contributors.py ./dev/github_jira_sync.py 
./dev/merge_spark_pr.py ./dev/pep8-1.7.0.py ./dev/pip-sanity-check.py 
./dev/run-tests-jenkins.py ./dev/run-tests.py 
./dev/sparktestsupport/__init__.py ./dev/sparktestsupport/modules.py 
./dev/sparktestsupport/shellutils.py ./dev/sparktestsupport/toposort.py 
./examples/src/main/python/als.py 
./examples/src/main/python/avro_inputformat.py 
./examples/src/main/python/kmeans.py 
./examples/src/main/python/logistic_regression.py 
./examples/src/main/python/ml/aft_survival_regression.py 
./examples/src/main/python/ml/als_example.py 
./examples/src/main/python/ml/binarizer_example.py 
./examples/src/main/python/ml/bisecting_k_means_example.py 
./examples/src/main/python/ml/bucketizer_example.py 
./examples/src/main/python/ml/chisq_selector_example.py 
./examples/src/main/python/ml/count_vectorizer_example.py 
./examples/src/main/python/ml/cross
 _validator.py ./examples/src/main/python/ml/dataframe_example.py 
./examples/src/main/python/ml/dct_example.py 
./examples/src/main/python/ml/decision_tree_classification_example.py 
./examples/src/main/python/ml/decision_tree_regression_example.py 
./examples/src/main/python/ml/elementwise_product_example.py 
./examples/src/main/python/ml/estimator_transformer_param_example.py 
./examples/src/main/python/ml/gaussian_mixture_example.py 
./examples/src/main/python/ml/generalized_linear_regression_example.py 
./examples/src/main/python/ml/gradient_boosted_tree_classifier_example.py 
./examples/src/main/python/ml/gradient_boosted_tree_regressor_example.py 
./examples/src/main/python/ml/index_to_string_example.py 
./examples/src/main/python/ml/isotonic_regression_example.py 
./examples/src/main/python/ml/kmeans_example.py 
./examples/src/main/python/ml/lda_example.py 
./examples/src/main/python/ml/linear_regression_with_elastic_net.py 
./examples/src/main/python/ml/logistic_regression_summary_example.
 py ./examples/src/main/python/ml/logistic_regression_with_elastic_net.py 
./examples/src/main/python/ml/max_abs_scaler_example.py 
./examples/src/main/python/ml/min_max_scaler_example.py 
./examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py
 ./examples/src/main/python/ml/multilayer_perceptron_classification.py 
./examples/src/main/python/ml/n_gram_example.py 
./examples/src/main/python/ml/naive_bayes_example.py 
./examples/src/main/python/ml/normalizer_example.py 
./examples/src/main/python/ml/one_vs_rest_example.py 
./examples/src/main/python/ml/onehot_encoder_example.py 
./examples/src/main/python/ml/pca_example.py 
./examples/src/main/python/ml/pipeline_example.py 
./examples/src/main/python/ml/polynomial_expansion_example.py 
./examples/src/main/python/ml/quantile_discretizer_example.py 
./examples/src/main/python/ml/random_forest_classifier_example.py 
./examples/src/main/python/ml/random_forest_regressor_example.py 
./examples/src/main/python/ml/rformula_example.py
  ./examples/src/main/python/ml/sql_transformer.py 
./examples/src/main/python/ml/standard_scaler_example.py 
./examples/src/main/python/ml/stopwords_remover_example.py 
./examples/src/main/python/ml/string_indexer_example.py 
./examples/src/main/python/ml/tf_idf_example.py 
./examples/src/main/python/ml/tokenizer_example.py 
./examples/src/main/python/ml/train_validation_split.py 
./examples/src/main/python/ml/vector_assembler_example.py 
./examples/src/main/python/ml/vector_indexer_example.py 
./examples/src/main/python/ml/vector_slicer_example.py 
./examples/src/main/python/ml/word2vec_example.py 
./examples/src/main/python/mllib/binary_classification_metrics_example.py 
./examples/src/main/python/mllib/bisecting_k_means_example.py 
./examples/src/main/python/mllib/correlations.py 
./examples/src/main/python/mllib/correlations_example.py 

[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94273247
  
--- Diff: dev/lint-python ---
@@ -19,10 +19,8 @@
 
 SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
 SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
-PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ 
./dev/sparktestsupport"
-# TODO: fix pep8 errors with the rest of the Python scripts under dev
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py 
./dev/run-tests-jenkins.py"
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py"
+# Exclude auto-geneated configuration file.
+PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path 
"*python/docs/conf.py" )"
--- End diff --

It seems usually 32K on Cygwin by default in general. The actual length 
without any prefix seems 11K for now. Let me try to turn these into relative 
paths as a safe choice. Then, it would be safe in general.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94264468
  
--- Diff: 
examples/src/main/python/mllib/decision_tree_regression_example.py ---
@@ -44,7 +44,7 @@
 # Evaluate model on test instances and compute test error
 predictions = model.predict(testData.map(lambda x: x.features))
 labelsAndPredictions = testData.map(lambda lp: 
lp.label).zip(predictions)
-testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - 
p)).sum() /\
+testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] 
- lp[1])).sum() /\
--- End diff --

Ah ok, makes sense - I was looking at changes directly from pep8 but if we 
need it to be compiled with python3 to test py3 pep8 that makes sense (of 
course a follow up issue for proper py3 support is the best place to address 
the issues not blocking pep8 testing).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94263914
  
--- Diff: dev/lint-python ---
@@ -19,10 +19,8 @@
 
 SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
 SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
-PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ 
./dev/sparktestsupport"
-# TODO: fix pep8 errors with the rest of the Python scripts under dev
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py 
./dev/run-tests-jenkins.py"
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py"
+# Exclude auto-geneated configuration file.
+PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path 
"*python/docs/conf.py" )"
--- End diff --

Yea, I think this is a valid point. Let me check the length and the length 
limitation first for sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94263510
  
--- Diff: 
examples/src/main/python/mllib/decision_tree_regression_example.py ---
@@ -44,7 +44,7 @@
 # Evaluate model on test instances and compute test error
 predictions = model.predict(testData.map(lambda x: x.features))
 labelsAndPredictions = testData.map(lambda lp: 
lp.label).zip(predictions)
-testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - 
p)).sum() /\
+testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] 
- lp[1])).sum() /\
--- End diff --

That seems causing errors in python 3 when a tuple is used in lambda to 
unpack. It seems http://www.python.org/dev/peps/pep-3113 is related issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94259691
  
--- Diff: 
examples/src/main/python/mllib/decision_tree_regression_example.py ---
@@ -44,7 +44,7 @@
 # Evaluate model on test instances and compute test error
 predictions = model.predict(testData.map(lambda x: x.features))
 labelsAndPredictions = testData.map(lambda lp: 
lp.label).zip(predictions)
-testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - 
p)).sum() /\
+testMSE = labelsAndPredictions.map(lambda lp: (lp[0] - lp[1]) * (lp[0] 
- lp[1])).sum() /\
--- End diff --

Why did we get rid of the lambda (v, p) & similar elsewhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against a...

2016-12-30 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16405#discussion_r94259548
  
--- Diff: dev/lint-python ---
@@ -19,10 +19,8 @@
 
 SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
 SPARK_ROOT_DIR="$(dirname "$SCRIPT_DIR")"
-PATHS_TO_CHECK="./python/pyspark/ ./examples/src/main/python/ 
./dev/sparktestsupport"
-# TODO: fix pep8 errors with the rest of the Python scripts under dev
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/run-tests.py ./python/*.py 
./dev/run-tests-jenkins.py"
-PATHS_TO_CHECK="$PATHS_TO_CHECK ./dev/pip-sanity-check.py"
+# Exclude auto-geneated configuration file.
+PATHS_TO_CHECK="$( find "$SPARK_ROOT_DIR" -name "*.py" -not -path 
"*python/docs/conf.py" )"
--- End diff --

I'm slightly concerned we might eventually have this be too long to pass in 
the shell (on Linux in bash ARG_MAX is pretty high but that's not the case 
everywhere, although we would probably have to double the number of Python 
files before this started being an issue in Cygwin).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org