[GitHub] spark pull request: spark-submit with accept multiple properties-f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3490 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-submit with accept multiple properties-f...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3490#issuecomment-69116580 I would recommend that we close this PR for now until we file a corresponding JIRA that describes the issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-submit with accept multiple properties-f...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3490#issuecomment-66516901 I see. I think the complexity in merging multiple properties file arises when you have the same configs declared in two properties files. What is the expected semantics there, do we merge the values somehow? If not, which one is overridden? Or do we throw an exception? The worst that could happen is that the user thinks that a particular value is used, but it turns out that it's actually being overridden silently because another properties file also defines it. That said, I think I understand your use case a little more. Could you file a JIRA and add it to the title? Maybe we can have more people look at this and decide whether we actually want to support this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-submit with accept multiple properties-f...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3490#issuecomment-66398025 I agree. Would you mind closing this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-submit with accept multiple properties-f...
Github user lvsoft commented on the pull request: https://github.com/apache/spark/pull/3490#issuecomment-66404194 Sorry for late reply. I'll explain the use cases for multiple properties files. Currently I'm working on a benchmark utility for spark. It'll be nature to adjust properties for different workloads. I'd like to setup the configures with two parts: global confs for common properties, and private confs for each workloads. Without the support of multiple properties files, I have to merge the properties as a tmp conf file, and remove it after spark-submit finished. What's more, consider to submit multiple workloads for multiple times concurrently, the tmp conf file name need to be mutually exclusive. And if the benchmark processing was interrupted, the tmp conf files will be hard to clean. So I think, a more elegant approach is to add the support of multiple properties files for spark. Another reason for this PR: currently spark will use `spark-defaults.conf` if no properties-file specified, or use the specified properties-file and *discard* `spark-defaults.conf`. This behavior is also counter-intuitive for beginners. In most systems, it is a natural assumption that the values in `xxx-defaults.conf` will take effect if the properties is not overrided in user's config. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-submit with accept multiple properties-f...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3490#issuecomment-66404521 In your case, why don't just add common properties into private config and set a seperate propertiy file for each workload? Why would the tmp conf file be deleted after job finished? I don't think this is reasonalbe to make this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-submit with accept multiple properties-f...
Github user lvsoft commented on the pull request: https://github.com/apache/spark/pull/3490#issuecomment-66405387 Well, that's called separated property files, not *common* properties. It'll be hard to adjust common properties and easy to make mistakes. Delete tmp files is a common requirement in system design. Of course you can ignore tmp files. As I said, I think it's a more elegant approach. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-submit with accept multiple properties-f...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3490#issuecomment-66408350 As Patrick said, this will make confiugration more complex than more elegant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-submit with accept multiple properties-f...
Github user lvsoft commented on the pull request: https://github.com/apache/spark/pull/3490#issuecomment-66414770 Well, I can't understand what's the complexity of this PR. I've reviewed the SPARK-3779 marked as related and didn't find something related to this patch. And, this patch will be downward compatible with current `spark-submit` behavior. From my point of view, let's talk it level by level: 1. In case of necessity: I've give out two reasons, one for benchmark case, one for common intuition in most systems. 2. In case of complexity: This patch maintains downward compatibility, and I've described its detail at the beginning and didn't catch the relationship with SPARK-3779. 3. In case of elegance: I don't think this is the most elegant solution. However, in order to maintain compatibility and least impact to current system, this is the relatively elegant solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-submit with accept multiple properties-f...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3490#issuecomment-64916925 Hi @lvsoft - it was not in the design of this component to support multiple files, and I'd prefer not to do it. It makes it very hard to reason about the effective configuration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-submit with accept multiple properties-f...
GitHub user lvsoft opened a pull request: https://github.com/apache/spark/pull/3490 spark-submit with accept multiple properties-files and merge the values Current ```spark-submit``` accepts only one properties-file, and use ```spark-defaults.conf``` if unspecified. A more nature approach is patching the properties-files sequentially against ```spark-defaults.conf```. This PR affairs: 1. spark-submit script: join multiple ```--properties-file``` with comma and stored as ```SPARK_SUBMIT_PROPERTIES_FILES``` environment variable. Peek each properties-file to set ```SPARK_SUBMIT_BOOTSTRAP_DRIVER``` flag. 2. SparkSubmitArguments.scala: similar with 1. 3. SparkSubmitDriverBootstrapper.scala: accept ```SPARK_SUBMIT_PROPERTIES_FILES``` and call ```getPropertiesFromFiles``` for parsing. 4. Utils.scala: add ```getPropertiesFromFiles``` for the parsing of multiple properties-files. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvsoft/spark spark_submit_with_multi_properties Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3490.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3490 commit c18a266a1fa0c20331faed1193c168c1021edcf1 Author: Lv, Qi qi...@intel.com Date: 2014-11-25T08:48:03Z Spark submit accept multiple properties files commit 752a0581fde0692ee05213b51d0fc0368d8fd205 Author: Lv, Qi qi...@intel.com Date: 2014-11-26T08:56:29Z test pass --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-submit with accept multiple properties-f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3490#issuecomment-64739092 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org