tr...@gmail.com> wrote:
>
> Hi JG
> out of curiosity what's ur usecase? are you writing to S3? you could use
> Spark to do that , e.g using hadoop package
> org.apache.hadoop:hadoop-aws:2.7.1 ..that will download the aws client
> which is in line with hadoop 2.7.1?
>
>
Note: EMR builds Hadoop, Spark, et al, from source against specific
versions of certain packages like the AWS Java SDK, httpclient/core,
Jackson, etc., sometimes requiring some patches in these applications in
order to work with versions of these dependencies that differ from what the
applications
Prithish,
It would be helpful for you to share the spark-submit command you are
running.
~ Jonathan
On Sun, Feb 26, 2017 at 8:29 AM Prithish wrote:
> Thanks for the responses, I am running this on Amazon EMR which runs the
> Yarn cluster manager.
>
> On Sat, Feb 25, 2017
Prithish,
I saw you posted this on SO, so I responded there just now. See
http://stackoverflow.com/questions/42452622/custom-log4j-properties-on-aws-emr/42516161#42516161
In short, an hdfs:// path can't be used to configure log4j because log4j
knows nothing about hdfs. Instead, since you are
ed: connect failed: Connection refused
>>>
>>> channel 4: open failed: connect failed: Connection refused
>>>
>>> channel 5: open failed: connect failed: Connection refused
>>>
>>> channel 22: open failed: connect failed: Connection refused
>>
I would not recommend opening port 50070 on your cluster, as that would
give the entire world access to your data on HDFS. Instead, you should
follow the instructions found here to create a secure tunnel to the
cluster, through which you can proxy requests to the UIs using a browser
plugin like
If at first you don't succeed, try, try again. But please don't. :)
See the "unsubscribe" link here: http://spark.apache.org/community.html
I'm not sure I've ever come across an email list that allows you to
unsubscribe by responding to the list with "unsubscribe". At least, all of
the Apache
Ted, how is that thread related to Paolo's question?
On Fri, Jun 24, 2016 at 1:50 PM Ted Yu wrote:
> See this related thread:
>
>
> http://search-hadoop.com/m/q3RTtEor1vYWbsW=RE+Configuring+Log4J+Spark+1+5+on+EMR+4+1+
>
> On Fri, Jun 24, 2016 at 6:07 AM, Paolo Patierno
ave a bug tracking it, in case anyone else has
> time to look at it before I do.
>
> On Mon, Jun 20, 2016 at 1:20 PM, Jonathan Kelly <jonathaka...@gmail.com>
> wrote:
> > Thanks for the confirmation! Shall I cut a JIRA issue?
> >
> > On Mon, Jun 20, 2016 at 10:42 AM
Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly <jonathaka...@gmail.com>
> wrote:
> > Does anybody have any thoughts on this?
> >
> > On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jonathaka...@gmail.com>
> > wrote:
> >>
Does anybody have any thoughts on this?
On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jonathaka...@gmail.com>
wrote:
> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT
> (commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
> log4j.properties is not ge
Mich, what Jacek is saying is not that you implied that YARN relies on two
masters. He's just clarifying that yarn-client and yarn-cluster modes are
really both using the same (type of) master (simply "yarn"). In fact, if
you specify "--master yarn-client" or "--master yarn-cluster", spark-submit
I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT
(commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
log4j.properties is not getting picked up in the executor classpath (and
driver classpath for yarn-cluster mode), so Hadoop's log4j.properties file
is taking precedence in the YARN
Weiwei,
Please see this documentation for configuring Spark and other apps on EMR
4.x:
http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-configure-apps.html
This documentation about what has changed between 3.x and 4.x should also
be helpful:
Hi, Myles,
We do not install scikit-learn or spark-sklearn on EMR clusters by default,
but you may install them yourself by just doing "sudo pip install
scikit-learn spark-sklearn" (either by ssh'ing to the master instance and
running this manually, or by running it as an EMR Step).
~ Jonathan
This error is likely due to EMR including some Hadoop lib dirs in
spark.{driver,executor}.extraClassPath. (Hadoop bundles an older version of
Avro than what Spark uses, so you are probably getting bitten by this Avro
mismatch.)
We determined that these Hadoop dirs are not actually necessary to
On the line preceding the one that the compiler is complaining about (which
doesn't actually have a problem in itself), you declare df as
"df"+fileName, making it a string. Then you try to assign a DataFrame to
df, but it's already a string. I don't quite understand your intent with
that previous
(I'm not 100% sure, but...) I think the SPARK_EXECUTOR_* environment
variables are intended to be used with Spark Standalone. Even if not, I'd
recommend setting the corresponding properties in spark-defaults.conf
rather than in spark-env.sh.
For example, you may use the following Configuration
In yarn-client mode, the driver is separate from the AM. The AM is created
in YARN, and YARN controls where it goes (though you can somewhat control
it using YARN node labels--I just learned earlier today in a different
thread on this list that this can be controlled by
Just FYI, Spark 1.6 was released on emr-4.3.0 a couple days ago:
https://aws.amazon.com/blogs/aws/emr-4-3-0-new-updated-applications-command-line-export/
On Thu, Jan 28, 2016 at 7:30 PM Andrew Zurn wrote:
> Hey Daniel,
>
> Thanks for the response.
>
> After playing around for a
Daniel,
The "hadoop job -list" command is a deprecated form of "mapred job -list",
which is only for Hadoop MapReduce jobs. For Spark jobs, which run on YARN,
you instead want "yarn application -list".
Hope this helps,
Jonathan (from the EMR team)
On Tue, Jan 26, 2016 at 10:05 AM Daniel
Yes, IAM roles are actually required now for EMR. If you use Spark on EMR
(vs. just EC2), you get S3 configuration for free (it goes by the name
EMRFS), and it will use your IAM role for communicating with S3. Here is
the corresponding documentation:
tionId
>
> On Mon, Dec 14, 2015 at 2:33 PM, Jonathan Kelly <jonathaka...@gmail.com>
> wrote:
>
>> Are you running Spark on YARN? If so, you can get to the Spark UI via the
>> YARN ResourceManager. Each running Spark application will have a link on
>> the YARN Resou
Are you running Spark on YARN? If so, you can get to the Spark UI via the
YARN ResourceManager. Each running Spark application will have a link on
the YARN ResourceManager labeled "ApplicationMaster". If you click that, it
will take you to the Spark UI, even if it is running on a slave node in the
, so hopefully this works...
On Wednesday, December 2, 2015, Jonathan Kelly <jonathaka...@gmail.com>
wrote:
> EMR is currently running a private preview of an upcoming feature allowing
> EMR clusters to be launched in VPC private subnets. This will allow you to
> launch a cluster in a
I don't know if this actually has anything to do with why your job is
hanging, but since you are using EMR you should probably not set those
fs.s3 properties but rather let it use EMRFS, EMR's optimized Hadoop
FileSystem implementation for interacting with S3. One benefit is that it
will
He means for you to use jstack to obtain a stacktrace of all of the
threads. Or are you saying that the Java process never even starts?
On Mon, Nov 16, 2015 at 7:48 AM, Kayode Odeyemi wrote:
> Spark 1.5.1
>
> The fact is that there's no stack trace. No output from that
Christian,
Is there anything preventing you from using EMR, which will manage your
cluster for you? Creating large clusters would take mins on EMR instead of
hours. Also, EMR supports growing your cluster easily and recently added
support for shrinking your cluster gracefully (even while jobs are
to a private rather than
> public IP; replacing IPs brings me to the same Spark GUI.
>
> Joshua
> [image: Inline image 3]
>
>
>
>
> On Tue, Oct 13, 2015 at 6:23 PM, Jonathan Kelly <jonathaka...@gmail.com>
> wrote:
>
>> Joshua,
>>
>> Since Spa
Joshua,
Since Spark is configured to run on YARN in EMR, instead of viewing the
Spark application UI at port 4040, you should instead start from the YARN
ResourceManager (on port 8088), then click on the ApplicationMaster link
for the Spark application you are interested in. This will take you to
I cut https://issues.apache.org/jira/browse/SPARK-10790 for this issue.
On Wed, Sep 23, 2015 at 8:38 PM, Jonathan Kelly <jonathaka...@gmail.com>
wrote:
> AHA! I figured it out, but it required some tedious remote debugging of
> the Spark ApplicationMaster. (But now I understa
ly.
I can't seem to find a JIRA for this, so shall I file one, or has anybody
else seen anything like this?
~ Jonathan
On Wed, Sep 23, 2015 at 7:08 PM, Jonathan Kelly <jonathaka...@gmail.com>
wrote:
> Another update that doesn't make much sense:
>
> The SparkPi example doe
ing dynamic allocation.
>
>
> On Wed, Sep 23, 2015 at 18:04 Jonathan Kelly <jonathaka...@gmail.com>
> wrote:
>
>> I'm running into a problem with YARN dynamicAllocation on Spark 1.5.0
>> after using it successfully on an identically configured cluster with Spark
>>
work.
~ Jonathan
On Wed, Sep 23, 2015 at 6:22 PM, Jonathan Kelly <jonathaka...@gmail.com>
wrote:
> Thanks for the quick response!
>
> spark-shell is indeed using yarn-client. I forgot to mention that I also
> have "spark.master yarn-client" in my spark-defaults.co
I'm running into a problem with YARN dynamicAllocation on Spark 1.5.0 after
using it successfully on an identically configured cluster with Spark 1.4.1.
I'm getting the dreaded warning "YarnClusterScheduler: Initial job has not
accepted any resources; check your cluster UI to ensure that workers
35 matches
Mail list logo