[GitHub] spark pull request: spark-submit with accept multiple properties-f...

2014-12-09 Thread lvsoft
Github user lvsoft commented on the pull request:

https://github.com/apache/spark/pull/3490#issuecomment-66414770
  
Well, I can't understand what's the complexity of this PR. I've reviewed 
the SPARK-3779 marked as related and didn't find something related to this 
patch.
And, this patch will be downward compatible with current `spark-submit` 
behavior.

From my point of view, let's talk it level by level:
1. In case of necessity: I've give out two reasons, one for benchmark case, 
one for common intuition in most systems.
2. In case of complexity: This patch maintains downward compatibility, and 
I've described its detail at the beginning and didn't catch the relationship 
with SPARK-3779.
3. In case of elegance: I don't think this is the most elegant solution. 
However, in order to maintain compatibility and least impact to current system, 
this is the relatively elegant solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: spark-submit with accept multiple properties-f...

2014-12-09 Thread lvsoft
Github user lvsoft commented on the pull request:

https://github.com/apache/spark/pull/3490#issuecomment-66405387
  
Well, that's called separated property files, not *common* properties. 
It'll be hard to adjust common properties and easy to make mistakes.

Delete tmp files is a common requirement in system design. Of course you 
can ignore tmp files. As I said, I think it's a more elegant approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: spark-submit with accept multiple properties-f...

2014-12-09 Thread lvsoft
Github user lvsoft commented on the pull request:

https://github.com/apache/spark/pull/3490#issuecomment-66404194
  
Sorry for late reply. I'll explain the use cases for multiple properties 
files. 

Currently I'm working on a benchmark utility for spark. It'll be nature to 
adjust properties for different workloads.
I'd like to setup the configures with two parts: global confs for common 
properties, and private confs for each workloads. Without the support of 
multiple properties files, I have to merge the properties as a tmp conf file, 
and remove it after spark-submit finished. What's more, consider to submit 
multiple workloads for multiple times concurrently, the tmp conf file name need 
to be mutually exclusive. And if the benchmark processing was interrupted, the 
tmp conf files will be hard to clean.

So I think, a more elegant approach is to add the support of multiple 
properties files for spark.

Another reason for this PR: currently spark will use `spark-defaults.conf` 
if no properties-file specified, or use the specified properties-file and 
*discard* `spark-defaults.conf`. This behavior is also counter-intuitive for 
beginners. In most systems, it is a natural assumption that the values in 
`xxx-defaults.conf` will take effect if the properties is not overrided in 
user's config.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: spark-submit with accept multiple properties-f...

2014-11-26 Thread lvsoft
GitHub user lvsoft opened a pull request:

https://github.com/apache/spark/pull/3490

spark-submit with accept multiple properties-files and merge the values

Current ```spark-submit``` accepts only one properties-file, and use 
```spark-defaults.conf``` if unspecified.
A more nature approach is patching the properties-files sequentially 
against ```spark-defaults.conf```.

This PR affairs:
1. spark-submit script: join multiple ```--properties-file``` with comma 
and stored as ```SPARK_SUBMIT_PROPERTIES_FILES``` environment variable. Peek 
each properties-file to set ```SPARK_SUBMIT_BOOTSTRAP_DRIVER``` flag.
2. SparkSubmitArguments.scala: similar with 1.
3. SparkSubmitDriverBootstrapper.scala: accept 
```SPARK_SUBMIT_PROPERTIES_FILES``` and call ```getPropertiesFromFiles``` for 
parsing.
4. Utils.scala: add ```getPropertiesFromFiles``` for the parsing of 
multiple properties-files.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lvsoft/spark 
spark_submit_with_multi_properties

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3490.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3490


commit c18a266a1fa0c20331faed1193c168c1021edcf1
Author: Lv, Qi 
Date:   2014-11-25T08:48:03Z

Spark submit accept multiple properties files

commit 752a0581fde0692ee05213b51d0fc0368d8fd205
Author: Lv, Qi 
Date:   2014-11-26T08:56:29Z

test pass




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4475] change "localhost" to "127.0.0.1"...

2014-11-24 Thread lvsoft
Github user lvsoft commented on the pull request:

https://github.com/apache/spark/pull/3425#issuecomment-64308603
  
I did a doctest in aggregation.py to confirm this fix is OK if 
```localhost``` can not be resolved. However, I'm not fully confident that 
spark will work well totally in such situation. And I don't think a node is 
proper configured if ```localhost``` can not be resolved also.

However, I think ```127.0.0.1``` should always be used in local 
communication rather than ```localhost```, which can provide better 
suitability, without introducing any shortage. After all, make things right in 
the right situation is trivial, while make things right in a tolerable wrong 
situation is more difficult and more meaningful, which is what we are working 
hard for.

If you agreed with that, I can do a further check to eliminate all related 
```localhost``` in spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2313] PySpark pass port rather than std...

2014-11-24 Thread lvsoft
Github user lvsoft commented on the pull request:

https://github.com/apache/spark/pull/3424#issuecomment-64302794
  
I think this is a better solution. 
However, pass the port back via socket will affair py4j too.
Currently, stdin is the only supported method in py4j to pass back the port 
number.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4475] change "localhost" to "127.0.0.1"...

2014-11-24 Thread lvsoft
GitHub user lvsoft opened a pull request:

https://github.com/apache/spark/pull/3425

[SPARK-4475] change "localhost" to "127.0.0.1" if "localhost" can't be 
resolved

This will fix [SPARK-4475]

Simply change "localhost" to equivalent "127.0.0.1" will solve the issue.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lvsoft/spark 
feature/FixPySpark_failed_to_initialize_if_localhost_can_not_be_resolved

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3425.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3425


commit 25efc78dc766f63888bdae0fdb8dfabb457145ae
Author: Lv, Qi 
Date:   2014-11-24T08:24:56Z

change "localhost" to "127.0.0.1" if "localhost" can't be resolved




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2313] PySpark pass port rather than std...

2014-11-23 Thread lvsoft
GitHub user lvsoft opened a pull request:

https://github.com/apache/spark/pull/3424

[SPARK-2313] PySpark pass port rather than stdin

This patch will fix [SPARK-2313]. 

It peek available free port number, and pass the port number to 
Py4j.Gateway for binding via command line argument.
The initial value of the port number is scanned beginning at the mod of 
PID, which could avoid potential concurrency issues such as supporting multiple 
PySpark instances in future. And the port number printed from Py4j in STDIN is 
also parsed for double check.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lvsoft/spark 
feature/PySparkPassPortRatherThanSTDIN

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3424


commit ac603586647c7db7064464ec4bc96d045f664202
Author: Lv, Qi 
Date:   2014-11-24T07:38:52Z

make pyspark accept port via command line argument, and STDIN for double 
check

commit 3f843674ee1c3a5e364acdee3954806f6a6e05d8
Author: Lv, Qi 
Date:   2014-11-24T07:42:36Z

remove useless import




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org