Re: SparkML RandomForest java.lang.StackOverflowError

2016-04-01 Thread Joseph Bradley
computing time. >>>>> >>>>> Code I use to train model: >>>>> >>>>> int MAX_BINS = 16; >>>>> int NUM_CLASSES = 0; >>>>> double MIN_INFO_GAIN = 0.0; >>>>> int MAX_MEMORY_IN_MB = 256; >

Re: SparkML RandomForest java.lang.StackOverflowError

2016-04-01 Thread Joseph Bradley
; >>>> int MAX_MEMORY_IN_MB = 256; >>>> double SUBSAMPLING_RATE = 1.0; >>>> boolean USE_NODEID_CACHE = true; >>>> int CHECKPOINT_INTERVAL = 10; >>>> int RANDOM_SEED = 12345; >>>> >>>

Re: SparkML RandomForest java.lang.StackOverflowError

2016-03-31 Thread Eugene Morozov
>> int NODE_SIZE = 5; >>> int maxDepth = 30; >>> int numTrees = 50; >>> Strategy strategy = new Strategy(Algo.Regression(), Variance.instance(), >>> maxDepth, NUM_CLASSES, MAX_BINS, >>> QuantileStrategy.Sort(), new >>

Re: SparkML RandomForest java.lang.StackOverflowError

2016-03-30 Thread Eugene Morozov
t;> double MIN_INFO_GAIN = 0.0; >>>> int MAX_MEMORY_IN_MB = 256; >>>> double SUBSAMPLING_RATE = 1.0; >>>> boolean USE_NODEID_CACHE = true; >>>> int CHECKPOINT_INTERVAL = 10; >>>> int RANDOM_SEED = 12345; >>>> >>>>

Re: SparkML RandomForest java.lang.StackOverflowError

2016-03-29 Thread Eugene Morozov
; >>> Strategy strategy = new Strategy(Algo.Regression(), Variance.instance(), >>> maxDepth, NUM_CLASSES, MAX_BINS, >>> QuantileStrategy.Sort(), new >>> scala.collection.immutable.HashMap<>(), nodeSize, MIN_INFO_GAIN, >>> MAX

Re: SparkML RandomForest java.lang.StackOverflowError

2016-03-29 Thread Joseph Bradley
SES, MAX_BINS, >> QuantileStrategy.Sort(), new scala.collection.immutable.HashMap<>(), >> nodeSize, MIN_INFO_GAIN, >> MAX_MEMORY_IN_MB, SUBSAMPLING_RATE, USE_NODEID_CACHE, >> CHECKPOINT_INTERVAL); >> RandomForestModel model = RandomForest.trainRe

Re: SparkML RandomForest java.lang.StackOverflowError

2016-03-29 Thread Eugene Morozov
> RandomForestModel model = RandomForest.trainRegressor(labeledPoints.rdd(), > strategy, numTrees, "auto", RANDOM_SEED); > > > Any advice would be highly appreciated. > > The exception (~3000 lines long): > java.lang.StackOverflowErr

SparkML RandomForest java.lang.StackOverflowError

2016-03-29 Thread Eugene Morozov
e, MIN_INFO_GAIN, MAX_MEMORY_IN_MB, SUBSAMPLING_RATE, USE_NODEID_CACHE, CHECKPOINT_INTERVAL); RandomForestModel model = RandomForest.trainRegressor(labeledPoints.rdd(), strategy, numTrees, "auto", RANDOM_SEED); Any advice would be highly appreciated. The exception (~3000 lines long): ja

Re: Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Mohamed Nadjib MAMI
n reason) but it finishes and returns results. However, the weird thing is that after I run the same query again, I get the error: "java.lang.StackOverflowError". I Googled it but didn't find the error appearing with table caching and querying. Any hint is appreciated.

Re: Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Ted Yu
m running SQL queries (sqlContext.sql()) on Parquet tables and facing a > problem with table caching (sqlContext.cacheTable()), using spark-shell of > Spark 1.5.1. > > After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes > longer the first time (well, for

Re: Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Mohamed Nadjib MAMI
ext.cacheTable()), using spark-shell of Spark 1.5.1. After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes longer the first time (well, for the lazy execution reason) but it finishes and returns results. However, the weird thing is that after I run the same query again, I get the err

Re: Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Ted Yu
ns results. However, the weird thing is that after I run the same > query again, I get the error: "java.lang.StackOverflowError". > > I Googled it but didn't find the error appearing with table caching and > querying. > Any hint is appreciated. > ---

Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Mohamed Nadjib MAMI
execution reason) but it finishes and returns results. However, the weird thing is that after I run the same query again, I get the error: "java.lang.StackOverflowError". I Googled it but didn't find the error appearing with table caching and querying. Any hint is appreciated.

Re: Spark Streaming: java.lang.StackOverflowError

2016-03-01 Thread Cody Koeninger
What code is triggering the stack overflow? On Mon, Feb 29, 2016 at 11:13 PM, Vinti Maheshwari wrote: > Hi All, > > I am getting below error in spark-streaming application, i am using kafka > for input stream. When i was doing with socket, it was working fine. But > when i

Spark Accumulator Issue - java.io.IOException: java.lang.StackOverflowError

2015-07-24 Thread Jadhav Shweta
= { uA ++= u }) var uRDD = sparkContext.parallelize(uA.value) Its failing on large dataset with following error java.io.IOException: java.lang.StackOverflowError at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140

Spark Accumulator Issue - java.io.IOException: java.lang.StackOverflowError

2015-07-15 Thread Jadhav Shweta
= { uA ++= u }) var uRDD = sparkContext.parallelize(uA.value) Its failing on large dataset with following error java.io.IOException: java.lang.StackOverflowError at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140

java.lang.StackOverflowError when doing spark sql

2015-02-19 Thread bit1...@163.com
java.lang.StackOverflowError at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222

java.lang.stackoverflowerror when running Spark shell

2014-09-23 Thread mrshen
I tested the examples according to the docs in spark sql programming guide, but the java.lang.stackoverflowerror occurred everytime I called sqlContext.sql(...). Meanwhile, it worked fine in a hiveContext. The Hadoop version is 2.2.0, the Spark version is 1.1.0, built with Yarn, Hive.. I would

Re: java.lang.StackOverflowError when calling count()

2014-08-12 Thread Tathagata Das
The long lineage causes a long/deep Java object tree (DAG of RDD objects), which needs to be serialized as part of the task creation. When serializing, the whole object DAG needs to be traversed leading to the stackoverflow error. TD On Mon, Aug 11, 2014 at 7:14 PM, randylu randyl...@gmail.com

Re: java.lang.StackOverflowError when calling count()

2014-08-12 Thread randylu
hi, TD. Thanks very much! I got it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-StackOverflowError-when-calling-count-tp5649p11980.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: java.lang.StackOverflowError when calling count()

2014-08-11 Thread randylu
hi, TD. I also fall into the trap of long lineage, and your suggestions do work well. But i don't understand why the long lineage can cause stackover, and where it takes effect? -- View this message in context:

java.lang.StackOverflowError

2014-08-05 Thread Chengi Liu
py4j.protocol.Py4JJavaError: An error occurred while calling o9564.saveAsTextFile. : org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError java.io.Bits.putInt(Bits.java:93) java.io.ObjectOutputStream$BlockDataOutputStream.writeInt(ObjectOutputStream.java

Re: java.lang.StackOverflowError

2014-08-05 Thread Chengi Liu
/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o9564.saveAsTextFile. : org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError

Re: java.lang.StackOverflowError

2014-08-05 Thread Davies Liu
py4j.protocol.Py4JJavaError: An error occurred while calling o9564.saveAsTextFile. : org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError java.io.Bits.putInt(Bits.java:93) java.io.ObjectOutputStream$BlockDataOutputStream.writeInt

Re: java.lang.StackOverflowError when calling count()

2014-07-26 Thread Tathagata Das
Responses inline. On Wed, Jul 23, 2014 at 4:13 AM, lalit1303 la...@sigmoidanalytics.com wrote: Hi, Thanks TD for your reply. I am still not able to resolve the problem for my use case. I have let's say 1000 different RDD's, and I am applying a transformation function on each RDD and I want

Re: java.lang.StackOverflowError when calling count()

2014-07-23 Thread lalit1303
Hi, Thanks TD for your reply. I am still not able to resolve the problem for my use case. I have let's say 1000 different RDD's, and I am applying a transformation function on each RDD and I want the output of all rdd's combined to a single output RDD. For, this I am doing the following: *Loop

Re: java.lang.StackOverflowError when calling count()

2014-05-14 Thread Nicholas Chammas
CODE:print round, round, rdd__new.count() File /home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/rdd.py, line 542, in count 14/05/12 16:20:28 INFO TaskSetManager: Loss was due to java.lang.StackOverflowError

Re: java.lang.StackOverflowError when calling count()

2014-05-14 Thread lalit1303
:28 INFO TaskSetManager: Loss was due to java.lang.StackOverflowError [duplicate 1] return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() 14/05/12 16:20:28 ERROR TaskSetManager: Task 8419.0:0 failed 1 times; aborting job File /home1/ghyan/Software/spark-0.9.0

Re: java.lang.StackOverflowError when calling count()

2014-05-13 Thread Mayur Rustagi
/pyspark/rdd.py, line 542, in count 14/05/12 16:20:28 INFO TaskSetManager: Loss was due to java.lang.StackOverflowError [duplicate 1] return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() 14/05/12 16:20:28 ERROR TaskSetManager: Task 8419.0:0 failed 1 times; aborting job

Re: java.lang.StackOverflowError when calling count()

2014-05-13 Thread Guanhua Yan
CODE:print round, round, rdd__new.count() File /home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/ rdd.py, line 542, in count 14/05/12 16:20:28 INFO TaskSetManager: Loss was due to java.lang.StackOverflowError [duplicate 1

Re: java.lang.StackOverflowError when calling count()

2014-05-13 Thread Mayur Rustagi
: Loss was due to java.lang.StackOverflowError [duplicate 1] return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() 14/05/12 16:20:28 ERROR TaskSetManager: Task 8419.0:0 failed 1 times; aborting job File /home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python

java.lang.StackOverflowError when calling count()

2014-05-12 Thread Guanhua Yan
CODE:print round, round, rdd__new.count() File /home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/rdd .py, line 542, in count 14/05/12 16:20:28 INFO TaskSetManager: Loss was due to java.lang.StackOverflowError [duplicate 1