Problem with Graphx and number of partitions

2016-08-31 Thread alvarobrandon
Helo everyone:

I have a problem when setting the number of partitions inside Graphx with
the ConnectedComponents function. When I launch the application with the
default number of partition everything runs smoothly. However when I
increase the number of partitions to 150 for example ( it happens with
bigger values as well) it gets stuck in stage 5 in the last task.


 

with the following error


[Stage
5:=>  
(190 + 10) / 200]241.445: [GC [PSYoungGen: 118560K->800K(233472K)]
418401K->301406K(932864K), 0.0029430 secs] [Times: user=0.02 sys=0.00,
real=0.01 secs]
[Stage
5:=>(199
+ 1) / 200]16/08/31 11:09:23 ERROR spark.ContextCleaner: Error cleaning
broadcast 4
org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120
seconds. This timeout is controlled by spark.rpc.askTimeout
at
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
at
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at
scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
at
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at
scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643)
at
scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658)
at
scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635)
at
scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635)
at
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at
scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634)
at
scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685)
at
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
at scala.concurrent.Promise$class.tryFailure(Promise.scala:112)
at
scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153)
at
org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:241)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply
in 120 seconds
at
org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:242)
... 7 more

The way I set the number of partitions is when reading the graph through:

val graph = GraphLoader.edgeListFile(sc, input, true, minEdge,
StorageLevel.MEMORY_AND_DISK, StorageLevel.MEMORY_AND_DISK)
val res = graph.connectedComponents().vertices

The version of Spark I'm using is 

Scheduler Delay Time

2016-06-03 Thread alvarobrandon
Hello:

I'm doing some instrumentation in Spark and I've realised that some of my
tasks take really long times to complete because the Scheduler Delay Time. I
submit the apps through spark-submit in a YARN cluster. I was wondering if
this Delay time takes also into account the period between an app being
accepted by the YARN client and submitting the first job of the app. Or in
other words, between this moment where app is accepted:


 

and this one where is running:


 

 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Scheduler-Delay-Time-tp27085.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



DAG of Spark Sort application spanning two jobs

2016-05-30 Thread alvarobrandon
I've written a very simple Sort scala program with Spark.

/object Sort {

def main(args: Array[String]): Unit = {
if (args.length < 2) {
System.err.println("Usage: Sort  " +
" []")
System.exit(1)
}


val conf = new SparkConf().setAppName("BigDataBench Sort")
val spark = new SparkContext(conf)
val logger = new JobPropertiesLogger(spark,"/home/abrandon/log.csv")
val filename = args(0)
val save_file = args(1)
var splits = spark.defaultMinPartitions
if (args.length > 2){
splits = args(2).toInt
}
val lines = spark.textFile(filename, splits)
logger.start_timer()
val data_map = lines.map(line => {
(line, 1)
})

val result = data_map.sortByKey().map { line => line._1}
logger.stop_timer()
logger.write_log("Sort By Key: Sort App")
result.saveAsTextFile(save_file)

println("Result has been saved to: " + save_file)
}

}/


Now, I was thinking that since there is only one wide transformation
("sortByKey") two stages will be spanned. However I see two jobs with one
stage in Job 0 and two stages for Job 1. Am I missing something?. What I
don't get is the first stage of the second job. it seems to do the same job
as the stage of Job 0.

 
 
 





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/DAG-of-Spark-Sort-application-spanning-two-jobs-tp27047.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Reading Shuffle Data from highly loaded nodes

2016-05-09 Thread alvarobrandon
Hello everyone:

I'm running an experiment in a Spark cluster where some of the machines are
highly loaded with CPU, memory and network consuming process ( let's call
them straggler machines ). 

Obviously the tasks of these machines take longer to execute than in other
nodes of the cluster. However I've noticed that the tasks that fetch shuffle
data from these "straggler machines" are also delayed with long Read Shuffle
Data phases.

Is there anyway of knowing from which machines a task is reading its shuffle
data?. Something like node1 is reading its shuffle data from [node2,node3
and node4]?

Thanks in advance



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-Shuffle-Data-from-highly-loaded-nodes-tp26901.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Problem with History Server

2016-04-13 Thread alvarobrandon
Hello:

I'm using the history server to keep track of the applications I run in my
cluster. I'm using Spark with YARN.
When I run on application it finishes correctly even YARN says that it
finished. This is the result of the YARN Resource Manager API

{u'app': [{u'runningContainers': -1, u'allocatedVCores': -1, u'clusterId':
1460540049690, u'amContainerLogs':
u'http://parapide-2.rennes.grid5000.fr:8042/node/containerlogs/container_1460540049690_0001_01_01/abrandon',
u'id': u'*application_1460540049690_0001*', u'preemptedResourceMB': 0,
u'finishedTime': 1460550170085, u'numAMContainerPreempted': 0, u'user':
u'abrandon', u'preemptedResourceVCores': 0, u'startedTime': 1460548211207,
u'elapsedTime': 1958878, u'state': u'FINISHED',
u'numNonAMContainerPreempted': 0, u'progress': 100.0, u'trackingUI':
u'History', u'trackingUrl':
u'http://paranoia-1.rennes.grid5000.fr:8088/proxy/application_1460540049690_0001/A',
u'allocatedMB': -1, u'amHostHttpAddress':
u'parapide-2.rennes.grid5000.fr:8042', u'memorySeconds': 37936274,
u'applicationTags': u'', u'name': u'KMeans', u'queue': u'default',
u'vcoreSeconds': 13651, u'applicationType': u'SPARK', u'diagnostics': u'',
u'finalStatus': u'*SUCCEEDED*'}

However when I query the SPARK UI 


 

You can see that for Job ID 2 no tasks have run and I can't get information
about them. Is this some kind of bug?

Thanks for your help as always



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-History-Server-tp26777.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Dynamic allocation Spark

2016-02-26 Thread alvarobrandon
Hello everyone:

I'm trying the dynamic allocation in Spark with YARN. I have followed the
following configuration steps:
1. Copy the spark-*-yarn-shuffle.jar to the nodemanager classpath. "cp
/opt/spark/lib/spark-*-yarn-shuffle.jar /opt/hadoop/share/hadoop/yarn"
2. Added the shuffle service of spark in yarn-site.xml

yarn.nodemanager.aux-services
mapreduce_shuffle,spark_shuffle
shuffle implementation
  
3. Enabled the class for the shuffle service in yarn-site.xml
  
yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService
enable the class for dynamic allocation
  
4. Activated the dynamic allocation in the spark defaults
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled   true

When I launch my application it just stays in the queue accepted but it
never actually runs.
16/02/26 11:11:46 INFO yarn.Client: Application report for
application_1456482268159_0001 (state: ACCEPTED)

Am I missing something?

Thanks in advance as always 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Dynamic-allocation-Spark-tp26344.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: No event log in /tmp/spark-events

2016-02-26 Thread alvarobrandon
Just write /tmp/sparkserverlog without the file part. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/No-event-log-in-tmp-spark-events-tp26318p26343.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



SPARK REST API on YARN

2016-02-18 Thread alvarobrandon
Hello:

I wanted to access the REST API
(http://spark.apache.org/docs/latest/monitoring.html#rest-api) of Spark to
monitor my jobs. However I'm running my Spark Apps over YARN. When I try to
make a request to http://localhost:4040/api/v1 as the documentation says I
don't get any response. My question is. It is possible to access this REST
API when you are not using Spark in Standalone mode?

Thanks in advance



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-REST-API-on-YARN-tp26267.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Error when executing Spark application on YARN

2016-02-18 Thread alvarobrandon
Found the solution. I was pointing to the wrong hadoop conf directory. I feel
so stupid :P



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-executing-Spark-application-on-YARN-tp26248p26266.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Error when executing Spark application on YARN

2016-02-18 Thread alvarobrandon
1. It happens to all the classes inside the jar package.
2. I didn't do any changes. 
   - I have three nodes: one master and two slaves in the conf/slaves
file
   - In spark-env.sh I just set the HADOOP_CONF_DIR parameter
   - In spark-defaults.conf I didn't change anything
3. The container doesn't even starts. 

It seems like there is some problem when sending the jar files. I have just
realised I get the following message.
Diagnostics: java.io.IOException: Resource
file:/opt/spark/BenchMark-1.0-SNAPSHOT.jar changed on src filesystem
(expected 1455792343000, was 145579310





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-executing-Spark-application-on-YARN-tp26248p26264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Error when executing Spark application on YARN

2016-02-17 Thread alvarobrandon
Hello:

I'm trying to launch an application in a yarn cluster with the following
command


/opt/spark/bin/spark-submit --class com.abrandon.upm.GenerateKMeansData
--master yarn --deploy-mode client /opt/spark/BenchMark-1.0-SNAPSHOT.jar
kMeans 5 4 5 0.9 8

The last bit after the jar file are just the parameters of the
GenerateKMeansData application. I get the following error

16/02/17 15:31:01 INFO Client: Application report for
application_1455721308385_0005 (state: ACCEPTED)
16/02/17 15:31:02 INFO Client: Application report for
application_1455721308385_0005 (state: FAILED)
16/02/17 15:31:02 INFO Client: 
 client token: N/A
 diagnostics: Application application_1455721308385_0005 failed 2 times 
due
to AM Container for appattempt_1455721308385_0005_02 exited with 
exitCode: -1000
For more detailed output, check application tracking
page:http://stremi-17.reims.grid5000.fr:8088/proxy/application_1455721308385_0005/Then,
click on links to logs of each attempt.
Diagnostics: File
file:/tmp/spark-5a98e9d4-6f90-446d-9bec-f0d30bffae32/__spark_conf__2242504518276040137.zip
does not exist
java.io.FileNotFoundException: File
file:/tmp/spark-5a98e9d4-6f90-446d-9bec-f0d30bffae32/__spark_conf__2242504518276040137.zip
does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
 ApplicationMaster host: N/A
 ApplicationMaster RPC port: -1
 queue: default
 start time: 1455723059732
 final status: FAILED
 tracking URL:
http://stremi-17.reims.grid5000.fr:8088/cluster/app/application_1455721308385_0005
 user: abrandon
16/02/17 15:31:02 ERROR SparkContext: Error initializing SparkContext.

I think the important part is Diagnostics: File
file:/tmp/spark-5a98e9d4-6f90-446d-9bec-f0d30bffae32/__spark_conf__2242504518276040137.zip
does not exist. Does anybody know what that means?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-executing-Spark-application-on-YARN-tp26248.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Monitoring Spark HDFS Reads and Writes

2015-12-30 Thread alvarobrandon
Hello:

Is there anyway of monitoring the number of Bytes or blocks read and written
by an Spark application?. I'm running Spark with YARN and I want to measure
how I/O intensive a set of applications are. Closest thing I have seen is
the HDFS DataNode Logs in YARN but they don't seem to have Spark
applications specific reads and writes.

2015-12-21 18:29:15,347 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/127.0.0.1:53805, dest: /127.0.0.1:50010, bytes: 72159, op: HDFS_WRITE,
cliID: DFSClient_NONMAPREDUCE_-1850086307_1, offset: 0, srvID:
a9edc8ad-fb09-4621-b469-76de587560c0, blockid:
BP-189543387-138.100.13.81-1450715936956:blk_1073741837_1013, duration:
2619119
hadoop-alvarobrandon-datanode-usuariop81.fi.upm.es.log:2015-12-21
18:29:15,429 INFO org.apache.hadoop.hdfs.server.d

Is there any trace about this kind of operations to be found in any log?

Thanks in advance
Best Regards.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Monitoring-Spark-HDFS-Reads-and-Writes-tp25838.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Is there anyway to log properties from a Spark application

2015-12-28 Thread alvarobrandon
Hello:

I was wondering if its possible to log properties from Spark Applications
like spark.yarn.am.memory, spark.driver.cores, spark.reducer.maxSizeInFlight
without having to access the SparkConf object programmatically. I'm trying
to find some kind of log file that has traces of the execution of Spark apps
and its parameters.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-anyway-to-log-properties-from-a-Spark-application-tp25820.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org