[jira] [Commented] (SPARK-11085) Add support for HTTP proxy

2016-08-25 Thread SURESH CHAGANTI (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437727#comment-15437727
 ] 

SURESH CHAGANTI commented on SPARK-11085:
-

Hi All,
I have made the code changes to accept the HTTP proxy as a run-time argument 
and use that  for out bound calls

below is the pull request:

https://github.com/SureshChaganti/spark-ec2/commit/cfd4bf727bdf46b9456f8f4d89221d1377d9c221


> Add support for HTTP proxy 
> ---
>
> Key: SPARK-11085
> URL: https://issues.apache.org/jira/browse/SPARK-11085
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Reporter: Dustin Cote
>Priority: Minor
>
> Add a way to update ivysettings.xml for the spark-shell and spark-submit to 
> support proxy settings for clusters that need to access a remote repository 
> through an http proxy.  Typically this would be done like:
> JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=proxy.host -Dhttp.proxyPort=8080 
> -Dhttps.proxyHost=proxy.host.secure -Dhttps.proxyPort=8080"
> Directly in the ivysettings.xml would look like:
>  
>  proxyport="8080" 
> nonproxyhosts="nonproxy.host"/> 
>  
> Even better would be a way to customize the ivysettings.xml with command 
> options.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11085) Add support for HTTP proxy

2016-08-25 Thread SURESH CHAGANTI (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437505#comment-15437505
 ] 

SURESH CHAGANTI commented on SPARK-11085:
-

The following Script accepts the  "--proxy_host_port" argument

from __future__ import division, print_function, with_statement

import codecs
import hashlib
import itertools
import logging
import os
import os.path
import pipes
import random
import shutil
import string
from stat import S_IRUSR
import subprocess
import sys
import tarfile
import tempfile
import textwrap
import time
import warnings
from datetime import datetime
from optparse import OptionParser
from sys import stderr

if sys.version < "3":
from urllib2 import urlopen, Request, HTTPError
else:
from urllib.request import urlopen, Request
from urllib.error import HTTPError
raw_input = input
xrange = range

SPARK_EC2_VERSION = "1.6.2"
SPARK_EC2_DIR = os.path.dirname(os.path.realpath(__file__))

VALID_SPARK_VERSIONS = set([
"0.7.3",
"0.8.0",
"0.8.1",
"0.9.0",
"0.9.1",
"0.9.2",
"1.0.0",
"1.0.1",
"1.0.2",
"1.1.0",
"1.1.1",
"1.2.0",
"1.2.1",
"1.3.0",
"1.3.1",
"1.4.0",
"1.4.1",
"1.5.0",
"1.5.1",
"1.5.2",
"1.6.0",
"1.6.1",
"1.6.2",
])

SPARK_TACHYON_MAP = {
"1.0.0": "0.4.1",
"1.0.1": "0.4.1",
"1.0.2": "0.4.1",
"1.1.0": "0.5.0",
"1.1.1": "0.5.0",
"1.2.0": "0.5.0",
"1.2.1": "0.5.0",
"1.3.0": "0.5.0",
"1.3.1": "0.5.0",
"1.4.0": "0.6.4",
"1.4.1": "0.6.4",
"1.5.0": "0.7.1",
"1.5.1": "0.7.1",
"1.5.2": "0.7.1",
"1.6.0": "0.8.2",
"1.6.1": "0.8.2",
"1.6.2": "0.8.2",
}

DEFAULT_SPARK_VERSION = SPARK_EC2_VERSION
DEFAULT_SPARK_GITHUB_REPO = "https://github.com/apache/spark;

# Default location to get the spark-ec2 scripts (and ami-list) from
DEFAULT_SPARK_EC2_GITHUB_REPO = "https://github.com/amplab/spark-ec2;
DEFAULT_SPARK_EC2_BRANCH = "branch-1.6"


def setup_external_libs(libs):
"""
Download external libraries from PyPI to SPARK_EC2_DIR/lib/ and prepend 
them to our PATH.
"""
PYPI_URL_PREFIX = "https://pypi.python.org/packages/source;
SPARK_EC2_LIB_DIR = os.path.join(SPARK_EC2_DIR, "lib")

if not os.path.exists(SPARK_EC2_LIB_DIR):
print("Downloading external libraries that spark-ec2 needs from PyPI to 
{path}...".format(
path=SPARK_EC2_LIB_DIR
))
print("This should be a one-time operation.")
os.mkdir(SPARK_EC2_LIB_DIR)

for lib in libs:
versioned_lib_name = "{n}-{v}".format(n=lib["name"], v=lib["version"])
lib_dir = os.path.join(SPARK_EC2_LIB_DIR, versioned_lib_name)

if not os.path.isdir(lib_dir):
tgz_file_path = os.path.join(SPARK_EC2_LIB_DIR, versioned_lib_name 
+ ".tar.gz")
print(" - Downloading {lib}...".format(lib=lib["name"]))
download_stream = urlopen(

"{prefix}/{first_letter}/{lib_name}/{lib_name}-{lib_version}.tar.gz".format(
prefix=PYPI_URL_PREFIX,
first_letter=lib["name"][:1],
lib_name=lib["name"],
lib_version=lib["version"]
)
)
with open(tgz_file_path, "wb") as tgz_file:
tgz_file.write(download_stream.read())
with open(tgz_file_path, "rb") as tar:
if hashlib.md5(tar.read()).hexdigest() != lib["md5"]:
print("ERROR: Got wrong md5sum for 
{lib}.".format(lib=lib["name"]), file=stderr)
sys.exit(1)
tar = tarfile.open(tgz_file_path)
tar.extractall(path=SPARK_EC2_LIB_DIR)
tar.close()
os.remove(tgz_file_path)
print(" - Finished downloading {lib}.".format(lib=lib["name"]))
sys.path.insert(1, lib_dir)


# Only PyPI libraries are supported.
external_libs = [
{
"name": "boto",
"version": "2.34.0",
"md5": "5556223d2d0cc4d06dd4829e671dcecd"
}
]

setup_external_libs(external_libs)


import boto
from boto.ec2.blockdevicemapping import BlockDeviceMapping, BlockDeviceType, 
EBSBlockDeviceType
from boto import ec2


class UsageError(Exception):
pass


# Configure and parse our command-line arguments
def parse_args():
parser = OptionParser(
prog="spark-ec2",
version="%prog {v}".format(v=SPARK_EC2_VERSION),
usage="%prog [options]  \n\n"
+ " can be: launch, destroy, login, stop, start, get-master, 
reboot-slaves")

parser.add_option(
"-s", "--slaves", type="int", default=1,
help="Number of slaves to launch (default: %default)")
parser.add_option(
"-w", "--wait", type="int",
help="DEPRECATED (no longer necessary) - Seconds to wait for nodes to 
start")
parser.add_option(
"-k", "--key-pair",
help="Key pair to use 

[jira] [Commented] (SPARK-11085) Add support for HTTP proxy

2016-06-02 Thread Ion Alberdi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312367#comment-15312367
 ] 

Ion Alberdi commented on SPARK-11085:
-

To reproduce, on a network that needs an http_proxy to get to 
http://dl.bintray.com and https://repo1.maven.org.

% spark-shell --packages 
org.apache.spark:spark-streaming-kafka_2.11:1.6.1,com.datastax.spark:spark-cassandra-connector_2.11:1.6.1-M2
 --driver-java-options "-Dhttp.proxyHost= 
-Dhttp.proxyPort="
...

 spark-packages: tried

  
http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-kafka_2.11/1.6.1/spark-streaming-kafka_2.11-1.6.1.pom

  -- artifact 
org.apache.spark#spark-streaming-kafka_2.11;1.6.1!spark-streaming-kafka_2.11.jar:

  
http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-kafka_2.11/1.6.1/spark-streaming-kafka_2.11-1.6.1.jar

module not found: 
com.datastax.spark#spark-cassandra-connector_2.11;1.6.1-M2
Indeed,
http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-kafka_2.11/1.6.1/spark-streaming-kafka_2.11-1.6.1.pom
does not exist.

However, 
ERRORS
Server access error at url 
https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka_2.11/1.6.1/spark-streaming-kafka_2.11-1.6.1.pom
 (java.net.ConnectException: Connection timed out)

this is due to the proxy configuration not being taken into account, as
https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka_2.11/1.6.1/spark-streaming-kafka_2.11-1.6.1.pom
exists.
The difference between the two are
https://github.com/apache/spark/blob/0a3026990bd0cbad53f0001da793349201104958/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L904
one has the root set and not the other, which gets the url from apparently
https://github.com/apache/ant-ivy/blob/master/src/java/org/apache/ivy/plugins/resolver/IBiblioResolver.java#L71

I'm currently trying to figure out why the proxy is not taken using an 
IBiblioResolver that does not have its root set.






> Add support for HTTP proxy 
> ---
>
> Key: SPARK-11085
> URL: https://issues.apache.org/jira/browse/SPARK-11085
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Reporter: Dustin Cote
>Priority: Minor
>
> Add a way to update ivysettings.xml for the spark-shell and spark-submit to 
> support proxy settings for clusters that need to access a remote repository 
> through an http proxy.  Typically this would be done like:
> JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=proxy.host -Dhttp.proxyPort=8080 
> -Dhttps.proxyHost=proxy.host.secure -Dhttps.proxyPort=8080"
> Directly in the ivysettings.xml would look like:
>  
>  proxyport="8080" 
> nonproxyhosts="nonproxy.host"/> 
>  
> Even better would be a way to customize the ivysettings.xml with command 
> options.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11085) Add support for HTTP proxy

2016-06-02 Thread Ion Alberdi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312244#comment-15312244
 ] 

Ion Alberdi commented on SPARK-11085:
-

More precisely, it seems that when a url like
http://dl.bintray.com/spark-packages/maven/com/datastax/spark/spark-cassandra-connector_2.11/1.6.0-M2/spark-cassandra-connector_2.11-1.6.0-M2.pom

is tried it goes through the proxy. 

However,
when going to a maven compatible link, like

https://repo1.maven.org/maven2/com/datastax/spark/spark-cassandra-connector_2.11/1.6.0-M2/spark-cassandra-connector_2.11-1.6.0-M2.jar

Then the proxy is not taken into account.



> Add support for HTTP proxy 
> ---
>
> Key: SPARK-11085
> URL: https://issues.apache.org/jira/browse/SPARK-11085
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Reporter: Dustin Cote
>Priority: Minor
>
> Add a way to update ivysettings.xml for the spark-shell and spark-submit to 
> support proxy settings for clusters that need to access a remote repository 
> through an http proxy.  Typically this would be done like:
> JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=proxy.host -Dhttp.proxyPort=8080 
> -Dhttps.proxyHost=proxy.host.secure -Dhttps.proxyPort=8080"
> Directly in the ivysettings.xml would look like:
>  
>  proxyport="8080" 
> nonproxyhosts="nonproxy.host"/> 
>  
> Even better would be a way to customize the ivysettings.xml with command 
> options.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11085) Add support for HTTP proxy

2016-06-02 Thread Ion Alberdi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311906#comment-15311906
 ] 

Ion Alberdi commented on SPARK-11085:
-

Hello to all, 
I reproduce the error using the docker in 
https://github.com/Yannael/kafka-sparkstreaming-cassandra. 
What I tried to communicate the -Dhttp.proxyHost, -Dhttp.proxyPort parameters: 

- the javaopts worarkound mentionned above

- setting, spark.executor.extraJavaOptions. The launched command becomes
% java org.apache.spark.deploy.SparkSubmit ... --conf 
spark.driver.extraJavaOptions=-Dhttp.proxyHost= 
-Dhttp.proxyPort=
I wonder whether the shell is able to parse that line and thus transfer the two 
(-Dhttp.proxyHost= and -Dhttp.proxyPort=) parameters to 
org.apache.spark.deploy.SparkSubmit

- setting "--driver-java-options" parameters that ends with
% java org.apache.spark.deploy.SparkSubmit ... -Dhttp.proxyHost= 
-Dhttp.proxyPort=
even if the shell seems more likely to parse the two informations, the packages 
are not downloaded as the http request do not go through the proxy





> Add support for HTTP proxy 
> ---
>
> Key: SPARK-11085
> URL: https://issues.apache.org/jira/browse/SPARK-11085
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Reporter: Dustin Cote
>Priority: Minor
>
> Add a way to update ivysettings.xml for the spark-shell and spark-submit to 
> support proxy settings for clusters that need to access a remote repository 
> through an http proxy.  Typically this would be done like:
> JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=proxy.host -Dhttp.proxyPort=8080 
> -Dhttps.proxyHost=proxy.host.secure -Dhttps.proxyPort=8080"
> Directly in the ivysettings.xml would look like:
>  
>  proxyport="8080" 
> nonproxyhosts="nonproxy.host"/> 
>  
> Even better would be a way to customize the ivysettings.xml with command 
> options.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11085) Add support for HTTP proxy

2016-02-26 Thread Anbu Cheeralan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169346#comment-15169346
 ] 

Anbu Cheeralan commented on SPARK-11085:


I use Hortonworks. I am able to resolve this by doing the following
1. create javaopts file in SPARK_HOME/conf folder
2. add all javaopts like below:
-Dhttp.proxyHost=proxy.host 
-Dhttp.proxyPort=8080

> Add support for HTTP proxy 
> ---
>
> Key: SPARK-11085
> URL: https://issues.apache.org/jira/browse/SPARK-11085
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Reporter: Dustin Cote
>Priority: Minor
>
> Add a way to update ivysettings.xml for the spark-shell and spark-submit to 
> support proxy settings for clusters that need to access a remote repository 
> through an http proxy.  Typically this would be done like:
> JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=proxy.host -Dhttp.proxyPort=8080 
> -Dhttps.proxyHost=proxy.host.secure -Dhttps.proxyPort=8080"
> Directly in the ivysettings.xml would look like:
>  
>  proxyport="8080" 
> nonproxyhosts="nonproxy.host"/> 
>  
> Even better would be a way to customize the ivysettings.xml with command 
> options.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11085) Add support for HTTP proxy

2016-02-10 Thread Prosper Burq (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140906#comment-15140906
 ] 

Prosper Burq commented on SPARK-11085:
--

Hi,

Is this problem still unresolved ? I tried several option but could not find 
out to allows spark-submit to connect through the proxy. I tried to pass 
environment variable through different ways but none of them worked. 


> Add support for HTTP proxy 
> ---
>
> Key: SPARK-11085
> URL: https://issues.apache.org/jira/browse/SPARK-11085
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Reporter: Dustin Cote
>Priority: Minor
>
> Add a way to update ivysettings.xml for the spark-shell and spark-submit to 
> support proxy settings for clusters that need to access a remote repository 
> through an http proxy.  Typically this would be done like:
> JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=proxy.host -Dhttp.proxyPort=8080 
> -Dhttps.proxyHost=proxy.host.secure -Dhttps.proxyPort=8080"
> Directly in the ivysettings.xml would look like:
>  
>  proxyport="8080" 
> nonproxyhosts="nonproxy.host"/> 
>  
> Even better would be a way to customize the ivysettings.xml with command 
> options.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11085) Add support for HTTP proxy

2015-10-13 Thread Don Drake (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954989#comment-14954989
 ] 

Don Drake commented on SPARK-11085:
---

Neither of the options work.

> Add support for HTTP proxy 
> ---
>
> Key: SPARK-11085
> URL: https://issues.apache.org/jira/browse/SPARK-11085
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Reporter: Dustin Cote
>Priority: Minor
>
> Add a way to update ivysettings.xml for the spark-shell and spark-submit to 
> support proxy settings for clusters that need to access a remote repository 
> through an http proxy.  Typically this would be done like:
> JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=proxy.host -Dhttp.proxyPort=8080 
> -Dhttps.proxyHost=proxy.host.secure -Dhttps.proxyPort=8080"
> Directly in the ivysettings.xml would look like:
>  
>  proxyport="8080" 
> nonproxyhosts="nonproxy.host"/> 
>  
> Even better would be a way to customize the ivysettings.xml with command 
> options.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11085) Add support for HTTP proxy

2015-10-13 Thread Dustin Cote (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955001#comment-14955001
 ] 

Dustin Cote commented on SPARK-11085:
-

[~sowen] The problem here is that the dependencies to be downloaded with 
--packages can't be reached because those settings do not get forwarded into 
the respective Spark client.  I'll note this was being tried with Spark on YARN 
and the JAVA_OPTS was being set through spark.driver.extraJavaOptions.  The 
ivysettings change was being done through ~/.m2/ivysettings.xml.  It's more of 
a forwarding the settings to the Spark client issue.

At least on CDH, the relevant ivysettings.xml is bundled in the assembly jar 
and apparently not modified by the two methods:
:: loading settings :: url = 
jar:file:/opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p819.487/jars/spark-assembly-1.3.0-cdh5.4.2-hadoop2.6.0-cdh5.4.2.jar!/org/apache/ivy/core/settings/ivysettings.xml
 

This JIRA would be to come up with a way to modify or override this 
ivysettings.xml so that it can be used with proxy settings.

> Add support for HTTP proxy 
> ---
>
> Key: SPARK-11085
> URL: https://issues.apache.org/jira/browse/SPARK-11085
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Reporter: Dustin Cote
>Priority: Minor
>
> Add a way to update ivysettings.xml for the spark-shell and spark-submit to 
> support proxy settings for clusters that need to access a remote repository 
> through an http proxy.  Typically this would be done like:
> JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=proxy.host -Dhttp.proxyPort=8080 
> -Dhttps.proxyHost=proxy.host.secure -Dhttps.proxyPort=8080"
> Directly in the ivysettings.xml would look like:
>  
>  proxyport="8080" 
> nonproxyhosts="nonproxy.host"/> 
>  
> Even better would be a way to customize the ivysettings.xml with command 
> options.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11085) Add support for HTTP proxy

2015-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954980#comment-14954980
 ] 

Sean Owen commented on SPARK-11085:
---

Dustin so I'm clear, do those alternatives work?

> Add support for HTTP proxy 
> ---
>
> Key: SPARK-11085
> URL: https://issues.apache.org/jira/browse/SPARK-11085
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, Spark Submit
>Reporter: Dustin Cote
>
> Add a way to update ivysettings.xml for the spark-shell and spark-submit to 
> support proxy settings for clusters that need to access a remote repository 
> through an http proxy.  Typically this would be done like:
> JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=proxy.host -Dhttp.proxyPort=8080 
> -Dhttps.proxyHost=proxy.host.secure -Dhttps.proxyPort=8080"
> Directly in the ivysettings.xml would look like:
>  
>  proxyport="8080" 
> nonproxyhosts="nonproxy.host"/> 
>  
> Even better would be a way to customize the ivysettings.xml with command 
> options.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org