Re: using LogisticRegressionWithSGD.train in Python crashes with Broken pipe
Hi, I'm using Spark 1.1.0. There is no error on the executors -- it appears as if the job never gets properly dispatched -- the only message is the Broken Pipe message in the driver. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182p18846.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
using LogisticRegressionWithSGD.train in Python crashes with Broken pipe
I have a dataset comprised of ~200k labeled points whose features are SparseVectors with ~20M features. I take 5% of the data for a training set. model = LogisticRegressionWithSGD.train(training_set) fails with ERROR:py4j.java_gateway:Error while sending or receiving. Traceback (most recent call last): File /cluster/home/roskarr/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 472, in send_command self.socket.sendall(command.encode('utf-8')) File /cluster/home/roskarr/miniconda/lib/python2.7/socket.py, line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 32] Broken pipe I'm at a loss as to where to begin to debug this... any suggestions? Thanks, Rok -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: using LogisticRegressionWithSGD.train in Python crashes with Broken pipe
Hi Rok, you could try to debug it by first collecting your training_set, see if it gets you something back, before passing it to the train method. Then go through each line in the train method, also the serializer and check where it fails exactly. thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182p18190.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: using LogisticRegressionWithSGD.train in Python crashes with Broken pipe
yes, the training set is fine, I've verified it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182p18195.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: using LogisticRegressionWithSGD.train in Python crashes with Broken pipe
Which Spark version did you use? Could you check the WebUI and attach the error message on executors? -Xiangrui On Wed, Nov 5, 2014 at 8:23 AM, rok rokros...@gmail.com wrote: yes, the training set is fine, I've verified it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-LogisticRegressionWithSGD-train-in-Python-crashes-with-Broken-pipe-tp18182p18195.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org