Hi All,
I'm getting the following error when I execute start-master.sh which also
invokes spark-class at the end.
Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/
You need to build Spark with 'sbt/sbt assembly' before running this program.
After digging into the
to the jar it self so need for random class paths.
On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee alee...@hotmail.com wrote:
Hi All,
I'm getting the following error when I execute start-master.sh which also
invokes spark-class at the end.
Failed to find Spark assembly in /root/spark/assembly
Hi All,
I encountered this problem when the firewall is enabled between the spark-shell
and the Workers.
When I launch spark-shell in yarn-client mode, I notice that Workers on the
YARN containers are trying to talk to the driver (spark-shell), however, the
firewall is not opened and caused
-0400
Subject: Re: spark-shell driver interacting with Workers in YARN mode -
firewall blocking communication
From: yana.kadiy...@gmail.com
To: user@spark.apache.org
I think what you want to do is set spark.driver.port to a fixed port.
On Fri, May 2, 2014 at 1:52 PM, Andrew Lee alee...@hotmail.com
.nabble.com/Securing-Spark-s-Network-tp4832p4984.html
[2] http://en.wikipedia.org/wiki/Ephemeral_port
[3]
http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html
Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512) 286-6075
Andrew Lee ---05/02/2014
- (512) 286-6075
Andrew Lee ---05/04/2014 09:57:08 PM---Hi Jacob, Taking both concerns into
account, I'm actually thinking about using a separate subnet to
From: Andrew Lee alee...@hotmail.com
To: user@spark.apache.org user@spark.apache.org
Date: 05/04/2014 09:57 PM
Subject
Please check JAVA_HOME. Usually it should point to /usr/java/default on
CentOS/Linux.
or FYI: http://stackoverflow.com/questions/1117398/java-home-directory
Date: Tue, 6 May 2014 00:23:02 -0700
From: sln-1...@163.com
To: u...@spark.incubator.apache.org
Subject: run spark0.9.1 on yarn with
Does anyone know if:
./bin/spark-shell --master yarn
is running yarn-cluster or yarn-client by default?
Base on source code:
./core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
if (args.deployMode == cluster args.master.startsWith(yarn)) {
args.master = yarn-cluster
:
if (args.deployMode != cluster args.master.startsWith(yarn)) {
args.master = yarn-client}
2014-05-21 10:57 GMT-07:00 Andrew Lee alee...@hotmail.com:
Does anyone know if:
./bin/spark-shell --master yarn
is running yarn-cluster or yarn-client by default?
Base on source code:
./core/src
Hi All,
Have anyone ran into the same problem? By looking at the source code in
official release (rc11),this property settings is set to false by default,
however, I'm seeing the .sparkStaging folder remains on the HDFS and causing it
to fill up the disk pretty fast since SparkContext deploys
Forgot to mention that I am using spark-submit to submit jobs, and a verbose
mode print out looks like this with the SparkPi examples.The .sparkStaging
won't be deleted. My thoughts is that this should be part of the staging and
should be cleaned up as well when sc gets terminated.
I checked the source code, it looks like it was re-added back based on JIRA
SPARK-1588, but I don't know if there's any test case associated with this?
SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
Sandy Ryza sa...@cloudera.com
2014-04-29 12:54:02 -0700
Hi Christophe,
Make sure you have 3 slashes in the hdfs scheme.
e.g.
hdfs:///server_name:9000/user/user_name/spark-events
and in the spark-defaults.conf as
well.spark.eventLog.dir=hdfs:///server_name:9000/user/user_name/spark-events
Date: Thu, 19 Jun 2014 11:18:51 +0200
From:
Hi Kudryavtsev,
Here's what I am doing as a common practice and reference, I don't want to say
it is best practice since it requires a lot of customer experience and
feedback, but from a development and operating stand point, it will be great to
separate the YARN container logs with the Spark
Build: Spark 1.0.0 rc11 (git commit tag:
2f1dc868e5714882cf40d2633fb66772baf34789)
Hi All,
When I enabled the spark-defaults.conf for the eventLog, spark-shell broke
while spark-submit works.
I'm trying to create a separate directory per user to keep track with their own
Spark job event
As mentioned, deprecated in Spark 1.0+.
Try to use the --driver-class-path:
./bin/spark-shell --driver-class-path yourlib.jar:abc.jar:xyz.jar
Don't use glob *, specify the JAR one by one with colon.
Date: Wed, 9 Jul 2014 13:45:07 -0700
From: kat...@cs.pitt.edu
Subject: SPARK_CLASSPATH Warning
Ok, I found it on JIRA SPARK-2390:
https://issues.apache.org/jira/browse/SPARK-2390
So it looks like this is a known issue.
From: alee...@hotmail.com
To: user@spark.apache.org
Subject: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file
option?
Date: Tue, 8 Jul 2014 15:17:00
Hi All,
Currently, if you are running Spark HiveContext API with Hive 0.12, it won't
work due to the following 2 libraries which are not consistent with Hive 0.12
and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a common
practice, they should be consistent to work inter-operable).
for Hive-on-Spark now.
On Mon, Jul 21, 2014 at 6:27 PM, Andrew Lee alee...@hotmail.com wrote:
Hive and Hadoop are using an older version of guava libraries (11.0.1) where
Spark Hive is using guava 14.0.1+.
The community isn't willing to downgrade to 11.0.1 which is the current
version
Hi Jianshi,
Could you provide which HBase version you're using?
By the way, a quick sanity check on whether the Workers can access HBase?
Were you able to manually write one record to HBase with the serialize
function? Hardcode and test it ?
From: jianshi.hu...@gmail.com
Date: Fri, 25 Jul 2014
Hi All,
Not sure if anyone has ran into this problem, but this exist in spark 1.0.0
when you specify the location in conf/spark-defaults.conf for
spark.eventLog.dir hdfs:///user/$USER/spark/logs
to use the $USER env variable.
For example, I'm running the command with user 'test'.
In
2014-07-28 12:40 GMT-07:00 Andrew Lee alee...@hotmail.com:
Hi All,
Not sure if anyone has ran into this problem, but this exist in spark 1.0.0
when you specify the location in conf/spark-defaults.conf for
spark.eventLog.dir hdfs:///user/$USER/spark/logs
to use the $USER env variable
files explicitly to --jars option and it worked fine.
The Caused by... messages were found in yarn logs actually, I think it might
be useful if I can seem them from the console which runs spark-submit. Would
that be possible?
Jianshi
On Sat, Jul 26, 2014 at 7:08 AM, Andrew Lee alee
/user/hive/warehouse)
On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee lt;
alee526@
gt; wrote:
Hi All,
It has been awhile, but what I did to make it work is to make sure the
followings:
1. Hive is working when you run Hive CLI and JDBC via Hiveserver2
2. Make sure you have
(spark.driver.port)
}
}
From: Andrew Lee alee...@hotmail.com
Reply-To: user@spark.apache.org user@spark.apache.org
Date: Monday, July 21, 2014 at 10:27 AM
To: user@spark.apache.org user@spark.apache.org,
u...@spark.incubator.apache.org u...@spark.incubator.apache.org
Subject: RE
Hi All,
I have tried to pass the properties via the SparkContext.setLocalProperty and
HiveContext.setConf, both failed. Based on the results (haven't get a chance to
look into the code yet), HiveContext will try to initiate the JDBC connection
right away, I couldn't set other properties
A follow up on the hive-site.xml, if you
1. Specify it in spark/conf, then you can NOT apply it via the
--driver-class-path option, otherwise, you will get the following exceptions
when initializing SparkContext.
org.apache.spark.SparkException: Found both spark.driver.extraClassPath
Sorry folks, it is executing Spark jobs instead of Hive jobs. I mis-read the
logs since there were other activities going on on the cluster.
From: alee...@hotmail.com
To: ar...@sigmoidanalytics.com; tsind...@gmail.com
CC: user@spark.apache.org
Subject: RE: SparkSQL + Tableau Connector
Date: Wed,
I have ThriftServer2 up and running, however, I notice that it relays the query
to HiveServer2 when I pass the hive-site.xml to it.
I'm not sure if this is the expected behavior, but based on what I have up and
running, the ThriftServer2 invokes HiveServer2 that results in MapReduce or Tez
It looks like this is related to the underlying Hadoop configuration.
Try to deploy the Hadoop configuration with your job with --files and
--driver-class-path, or to the default /etc/hadoop/conf core-site.xml.
If that is not an option (depending on how your Hadoop cluster is setup), then
hard
or insights on what I'm missing here.
Thanks for the assistance.
-Todd
On Wed, Feb 11, 2015 at 3:20 PM, Andrew Lee alee...@hotmail.com wrote:
Sorry folks, it is executing Spark jobs instead of Hive jobs. I mis-read the
logs since there were other activities going on on the cluster.
From: alee
HI All,
Just want to give everyone an update of what worked for me. Thanks for Cheng's
comment and other ppl's help.
So what I misunderstood was the --driver-class-path and how that was related to
--files. I put both /etc/hive/hive-site.xml in both --files and
--driver-class-path when I
@spark.apache.org
I think you want to take a look at:
https://issues.apache.org/jira/browse/SPARK-6207
On Mon, Apr 20, 2015 at 1:58 PM, Andrew Lee alee...@hotmail.com wrote:
Hi All,
Affected version: spark 1.2.1 / 1.2.2 / 1.3-rc1
Posting this problem to user group first to see if someone
Hi Roberto,
I'm not an EMR person, but it looks like option -h is deploying the necessary
dataneucleus JARs for you.The req for HiveContext is the hive-site.xml and
dataneucleus JARs. As long as these 2 are there, and Spark is compiled with
-Phive, it should work.
spark-shell runs in
To: alee...@hotmail.com
CC: zjf...@gmail.com; rp...@njit.edu; user@spark.apache.org
Hi all,
Did you forget to restart the node managers after editing yarn-site.xml by any
chance?
-Andrew
2015-07-17 8:32 GMT-07:00 Andrew Lee alee...@hotmail.com:
I have encountered the same problem after following
Hi Andrew,
Thanks for the advice. I didn't see the log in the NodeManager, so apparently,
something was wrong with the yarn-site.xml configuration.
After digging in more, I realize it was an user error. I'm sharing this with
other people so others may know what mistake I have made.
When I review
I have encountered the same problem after following the document.
Here's my spark-defaults.confspark.shuffle.service.enabled true
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.executorIdleTimeout 60
spark.dynamicAllocation.cachedExecutorIdleTimeout 120
>From branch-2.0, Spark 2.0.0 preview,
I found it interesting, no matter what you do by configuring
spark.sql.warehouse.dir
it will always pull up the default path which is /user/hive/warehouse
In the code, I notice that at LOC45
In fact, it does require ojdbc from Oracle which also requires a username and
password. This was added as part of the testing scope for Oracle's docker.
I notice this PR and commit in branch-2.0 according to
https://issues.apache.org/jira/browse/SPARK-12941.
In the comment, I'm not sure what
39 matches
Mail list logo