Little more progress...
I add few enviornment variables, not I get following error message:
InvocationTargetException: Can't get Master Kerberos principal for use as
renewer -> [Help 1]
--
View this message in context:
Rest of the stacktrace.
WARNING]
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
I am trying to setup my IDE to a scala spark application. I want to access
HDFS files from remote Hadoop server that has Kerberos enabled. My
understanding is I should be able to do that from Spark. Here is my code so
far:
val sparkConf = new SparkConf().setAppName(appName).setMaster(master);
I seem to see this for many of my posts... does anyone have solution?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/This-post-has-NOT-been-accepted-by-the-mailing-list-yet-tp24969.html
Sent from the Apache Spark User List mailing list archive at
I couldn't get this working...
I have have JAVA_HOME set.
I have defined SPARK_HOME
Sys.setenv(SPARK_HOME="c:\DevTools\spark-1.5.1")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library("SparkR", lib.loc="c:\\DevTools\\spark-1.5.1\\lib")
library(SparkR)
It seems it is failing at
path <- tempfile(pattern = "backend_port") I do not see backend_port
directory created...
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Error-in-sparkR-init-master-local-in-RStudio-tp23768p24958.html
Sent from the
I am not sure what Loading status means, followed by Running. In the
application UI, I see:
Executor Summary
ExecutorID Worker Cores Memory State Logs
1 worker-20150202144112-hadoop-w-1.c.fi-mdd-poc.internal-3887416
83971
LOADING stdout stderr
0
I am using spark 1.2, and I see a lot of messages like:
ExternalSorter: Thread 66 spilling in-memory map of 5.0 MB to disk (13160
times so far)
I seem to have a lot of memory:
URL: spark://hadoop-m:7077
Workers: 4
Cores: 64 Total, 64 Used
Memory: 328.0 GB Total, 327.0 GB Used
This did not work for me. that is, rdd.coalesce(200, forceShuffle) . Does
anyone have ideas on how to distribute your data evenly and co-locate
partitions of interest?
--
View this message in context:
hmm..
33.6gb is sum of the memory used by the two RDD that is cached. You're
right when I put serialized RDDs in the cache, the memory foot print for
these rdds become a lot smaller.
Serialized Memory footprint shown below:
RDD NameStorage Level Cached Partitions Fraction Cached
I think, the memory calculation is correct, what I didn't account for is the
memory used. I am still puzzled as how I can successfully process the RDD
in spark.
--
View this message in context:
I am running in local mode. I am using google n1-highmem-16 (16 vCPU, 104 GB
memory) machine.
I have allocated the SPARK_DRIVER_MEMORY=95g
I see Memory: 33.6 GB Used (73.7 GB Total) that the exeuctor is using.
In the log out put below, I see 33.6 gb blocks are used by 2 rdds that I
have cached.
scala textFile.count()
java.lang.VerifyError: class
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$CompleteReques
tProto overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
I tried ./make-distribution.sh -Dhadoop.version=2.5.0 and
only option is to split you problem further by increasing parallelism My
understanding is by increasing the number of partitions, is that right?
That didn't seem to help because it is seem the partitions are not uniformly
sized. My observation is when I increase the number of partitions, it
/usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java
org.apache.spark.deploy.SparkSubmitDriverBootstrapper
When I execute /usr/local/spark-1.1.0/bin/spark-submit local[32] for my
app, I see two processes get spun off. One is the
org.apache.spark.deploy.SparkSubmitDriverBootstrapper and
no luck :(! Still observing the same behavior!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637p17988.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I am running local (client). My vm is 16 cpu/108gb ram. My configuration is
as following:
spark.executor.extraJavaOptions -XX:+PrintGCDetails -XX:+UseCompressedOops
-XX:+UseParallelGC -XX:+UseParallelOldGC -XX:+DisableExplicitGC
-XX:MaxPermSize=1024m
spark.daemon.memory=20g
Thanks for the pointers! I did tried but didn't seem to help...
In my latest try, I am doing spark-submit local
But see the same message in spark App ui (4040)
localhost CANNOT FIND ADDRESS
In the logs, I see a lot of in-memory map to disk. I don't understand why
that is the case.
SparkApplication UI shows that one of the executor Cannot find Addresss
Aggregated Metrics by Executor
Executor ID Address Task Time Total Tasks Failed Tasks
Succeeded Tasks Input
Shuffle ReadShuffle
I am relatively new to spark processing. I am using Spark Java API to process
data. I am having trouble processing a data set that I don't think is
significantly large. It is joining a dataset that is around 3-4gb each
(around 12 gb data).
The workflow is:
x=RDD1.KeyBy(x).partitionBy(new
Thanks...hmm It is seems to be a timeout issue perhaps?? Not sure what
is causing it? or how to debug?
I see following error message...
4/10/29 13:26:04 ERROR ContextCleaner: Error cleaning broadcast 9
akka.pattern.AskTimeoutException: Timed out
at
I recently starting seeing this new problem where spark-submt is terminated
by Killed message but no error message indicating what happened. I have
enable logging on in spark configuration. has anyone seen this or know how
to troubleshoot?
--
View this message in context:
I did have it as rdd.saveAsText(RDD);
and now I have it as:
Log.info(RDD
Counts+rdd.persist(StorageLevel.MEMORY_AND_DISK_SER()).count());
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submt-job-Killed-tp17560p17598.html
Sent from the Apache Spark
23 matches
Mail list logo