Re: Spark Mlib - java.lang.OutOfMemoryError: Java heap space

2017-04-24 Thread Selvam Raman
This is where job going out of memory

17/04/24 10:09:22 INFO TaskSetManager: Finished task 122.0 in stage 1.0
(TID 356) in 4260 ms on ip-...-45.dev (124/234)
17/04/24 10:09:26 INFO BlockManagerInfo: Removed taskresult_361 on
ip-10...-185.dev:36974 in memory (size: 5.2 MB, free: 8.5 GB)
17/04/24 10:09:26 INFO BlockManagerInfo: Removed taskresult_362 on
ip-...-45.dev:40963 in memory (size: 5.2 MB, free: 8.9 GB)
17/04/24 10:09:26 INFO TaskSetManager: Finished task 125.0 in stage 1.0
(TID 359) in 4383 ms on ip-...-45.dev (125/234)
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 15090"...
Killed

Node-45.dev contains 8.9GB free while it throws out of memory. Can anyone
please help me to understand the issue?

On Mon, Apr 24, 2017 at 11:22 AM, Selvam Raman  wrote:

> Hi,
>
> I have 1 master and 4 slave node. Input data size is 14GB.
> Slave Node config : 32GB Ram,16 core
>
>
> I am trying to train word embedding model using spark. It is going out of
> memory. To train 14GB of data how much memory do i require?.
>
>
> I have givem 20gb per executor but below shows it is using 11.8GB out of
> 20 GB.
> BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-.-.-.dev:35035
> (size: 4.6 KB, free: 11.8 GB)
>
>
> This is the code
> if __name__ == "__main__":
> sc = SparkContext(appName="Word2VecExample")  # SparkContext
>
> # $example on$
> inp = sc.textFile("s3://word2vec/data/word2vec_word_data.txt/").map(lambda
> row: row.split(" "))
>
> word2vec = Word2Vec()
> model = word2vec.fit(inp)
>
> model.save(sc, "s3://pysparkml/word2vecresult2/")
> sc.stop()
>
>
> Spark-submit Command:
> spark-submit --master yarn --conf 
> 'spark.executor.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/mnt/tmp -XX:+UseG1GC -XX:+UseG1GC -XX:+PrintFlagsFinal
> -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
> -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark' --num-executors 4
> --executor-cores 2 --executor-memory 20g Word2VecExample.py
>
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>



-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Spark Mlib - java.lang.OutOfMemoryError: Java heap space

2017-04-24 Thread Selvam Raman
Hi,

I have 1 master and 4 slave node. Input data size is 14GB.
Slave Node config : 32GB Ram,16 core


I am trying to train word embedding model using spark. It is going out of
memory. To train 14GB of data how much memory do i require?.


I have givem 20gb per executor but below shows it is using 11.8GB out of 20
GB.
BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-.-.-.dev:35035
(size: 4.6 KB, free: 11.8 GB)


This is the code
if __name__ == "__main__":
sc = SparkContext(appName="Word2VecExample")  # SparkContext

# $example on$
inp =
sc.textFile("s3://word2vec/data/word2vec_word_data.txt/").map(lambda row:
row.split(" "))

word2vec = Word2Vec()
model = word2vec.fit(inp)

model.save(sc, "s3://pysparkml/word2vecresult2/")
sc.stop()


Spark-submit Command:
spark-submit --master yarn --conf
'spark.executor.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/mnt/tmp -XX:+UseG1GC -XX:+UseG1GC -XX:+PrintFlagsFinal
-XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
-XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark' --num-executors 4
--executor-cores 2 --executor-memory 20g Word2VecExample.py


-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"