I have some suggestions you may try
1) input RDD ,use the persist method ,this may much save running time
2) from the UI,you can see cluster spend much time in shuffle stage , this
can adjust through some conf parameters ,such as"
spark.shuffle.memoryFraction" "spark.memory.fraction"
good luck
Hi:
there is a little error in source code LDA.scala at line 180, as
follows:
def setBeta(beta: Double): this.type = setBeta(beta)
which cause java.lang.StackOverflowError. It's easy to see there is
error
--
View this message in context:
I am not sure this can help you. I have 57 million rating,about 4million user
and 4k items. I used 7-14 total-executor-cores,executal-memory 13g,cluster
have 4 nodes,each have 4cores,max memory 16g.
I found set as follows may help avoid this problem:
Hi
Recently I have some problems about rdd behaviors.It's about
RDD.first,RDD.toArray method when RDD only has one element.
I get the different result in different method from one element RDD
where i
should have the same result. I will give more detail after the code.
My
I get the key point . The problem is in sc.sequenceFile,From API description
RDD will create many references to the same objecty ,So I revise the code
sessions.getBytes to sessions.getBytes.clone,
It seems to work.
Thanks.
--
View this message in context:
i think you can try to set lower spark.storage.memoryFraction,for example 0.4
conf.set(spark.storage.memoryFraction,0.4) //default 0.6
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-processing-multiple-RDDs-tp18628p18659.html
Sent from the
Hi
Recently I want to save a big RDD[(k,v)] in form of index and data ,I
deceide to use hadoop mapFile. I tried some examples like this
:https://gist.github.com/airawat/6538748
I runs the code well and generate a index and data file. I can use
command
hadoop fs -text
Hi
Recently i want to save a big RDD[(k,v)] in form of index and data ,I
deceide to use hadoop mapFile. I tried some examples like this
:https://gist.github.com/airawat/6538748
I runs the code well and generate a index and data file. I can use
command
hadoop fs -text
You should supply more information about your input data.
For example ,I generate a IndexRowMatrix from ALS algorithm input data
format,my code like this:
val inputData = sc.textFile(fname).map{
line=
val parts = line.trim.split(' ')
Hi:
After update spark to version1.1.0, I experienced a snappy error which
was
posted here
http://apache-spark-user-list.1001560.n3.nabble.com/Update-gcc-version-Still-snappy-error-tt15137.html
. I avoid this problem with
Here is error log,I abstract as follows:
INFO [binaryTest---main]: before first
WARN [org.apache.spark.scheduler.TaskSetManager---Result resolver
thread-0]: Lost task 0.0 in stage 0.0 (TID 0, spark-dev136):
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
Hi:
I want to use SVD in my work. I tried some examples and have some
confusions. The input the 4*3 matrix as follows:
2 0 0
0 3 2
0 3 1
2 0 3
My input file text as follows which is corresponding to the matrix
0 0 2
1 1 3
1 2
I update the spark version form 1.02 to 1.10 , experienced an snappy version
issue with the new Spark-1.1.0. After update the glibc version, occured a
another issue. I abstract the log as follows:
14/09/25 11:29:18 WARN [org.apache.hadoop.util.NativeCodeLoader---main]:
Unable to load
13 matches
Mail list logo