Re: Execution error during ALS execution in spark

2016-03-31 Thread buring
I have some suggestions you may try 1) input RDD ,use the persist method ,this may much save running time 2) from the UI,you can see cluster spend much time in shuffle stage , this can adjust through some conf parameters ,such as" spark.shuffle.memoryFraction" "spark.memory.fraction" good luck

LDA code little error @Xiangrui Meng

2015-04-22 Thread buring
Hi: there is a little error in source code LDA.scala at line 180, as follows: def setBeta(beta: Double): this.type = setBeta(beta) which cause java.lang.StackOverflowError. It's easy to see there is error -- View this message in context:

Re: MLLib /ALS : java.lang.OutOfMemoryError: Java heap space

2014-12-17 Thread buring
I am not sure this can help you. I have 57 million rating,about 4million user and 4k items. I used 7-14 total-executor-cores,executal-memory 13g,cluster have 4 nodes,each have 4cores,max memory 16g. I found set as follows may help avoid this problem:

toArray,first get the different result from one element RDD

2014-12-16 Thread buring
Hi Recently I have some problems about rdd behaviors.It's about RDD.first,RDD.toArray method when RDD only has one element. I get the different result in different method from one element RDD where i should have the same result. I will give more detail after the code. My

Re: toArray,first get the different result from one element RDD

2014-12-16 Thread buring
I get the key point . The problem is in sc.sequenceFile,From API description RDD will create many references to the same objecty ,So I revise the code sessions.getBytes to sessions.getBytes.clone, It seems to work. Thanks. -- View this message in context:

Re: Help with processing multiple RDDs

2014-11-11 Thread buring
i think you can try to set lower spark.storage.memoryFraction,for example 0.4 conf.set(spark.storage.memoryFraction,0.4) //default 0.6 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-processing-multiple-RDDs-tp18628p18659.html Sent from the

index File create by mapFile can't

2014-11-10 Thread buring
Hi Recently I want to save a big RDD[(k,v)] in form of index and data ,I deceide to use hadoop mapFile. I tried some examples like this :https://gist.github.com/airawat/6538748 I runs the code well and generate a index and data file. I can use command hadoop fs -text

index File create by mapFile can't read

2014-11-10 Thread buring
Hi Recently i want to save a big RDD[(k,v)] in form of index and data ,I deceide to use hadoop mapFile. I tried some examples like this :https://gist.github.com/airawat/6538748 I runs the code well and generate a index and data file. I can use command hadoop fs -text

Re: To generate IndexedRowMatrix from an RowMatrix

2014-11-10 Thread buring
You should supply more information about your input data. For example ,I generate a IndexRowMatrix from ALS algorithm input data format,my code like this: val inputData = sc.textFile(fname).map{ line= val parts = line.trim.split(' ')

How to avoid use snappy compression when saveAsSequenceFile?

2014-10-27 Thread buring
Hi: After update spark to version1.1.0, I experienced a snappy error which was posted here http://apache-spark-user-list.1001560.n3.nabble.com/Update-gcc-version-Still-snappy-error-tt15137.html . I avoid this problem with

Re: How to avoid use snappy compression when saveAsSequenceFile?

2014-10-27 Thread buring
Here is error log,I abstract as follows: INFO [binaryTest---main]: before first WARN [org.apache.spark.scheduler.TaskSetManager---Result resolver thread-0]: Lost task 0.0 in stage 0.0 (TID 0, spark-dev136): org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null

The confusion order of rows in SVD matrix ?

2014-09-29 Thread buring
Hi: I want to use SVD in my work. I tried some examples and have some confusions. The input the 4*3 matrix as follows: 2 0 0 0 3 2 0 3 1 2 0 3 My input file text as follows which is corresponding to the matrix 0 0 2 1 1 3 1 2

Update gcc version ,Still snappy error.

2014-09-25 Thread buring
I update the spark version form 1.02 to 1.10 , experienced an snappy version issue with the new Spark-1.1.0. After update the glibc version, occured a another issue. I abstract the log as follows: 14/09/25 11:29:18 WARN [org.apache.hadoop.util.NativeCodeLoader---main]: Unable to load