Re: LBGFS optimizer performace

2015-03-06 Thread Gustavo Enrique Salazar Torres
with ability to caching more data. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Tue, Mar 3, 2015 at 2:27 PM, Gustavo Enrique Salazar Torres gsala...@ime.usp.br wrote: Yeah, I can call count before that and it works. Also I was over

Re: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: pyspark on yarn

2015-03-03 Thread Gustavo Enrique Salazar Torres
Hi Sam: Shouldn't you define the table schema? I had the same problem in Scala and then I solved it defining the schema. I did this: sqlContext.applySchema(dataRDD, tableSchema).registerTempTable(tableName) Hope it helps. On Mon, Jan 5, 2015 at 7:01 PM, Sam Flint sam.fl...@magnetic.com wrote:

Re: LBGFS optimizer performace

2015-03-03 Thread Gustavo Enrique Salazar Torres
even gets to LBFGS. (Perhaps the outer join you're trying to do is making the dataset size explode a bit.) Are you able to call count() (or any RDD action) on the data before you pass it to LBFGS? On Tue, Mar 3, 2015 at 8:55 AM, Gustavo Enrique Salazar Torres gsala...@ime.usp.br wrote

LBGFS optimizer performace

2015-03-02 Thread Gustavo Enrique Salazar Torres
Hi there: I'm using LBFGS optimizer to train a logistic regression model. The code I implemented follows the pattern showed in https://spark.apache.org/docs/1.2.0/mllib-linear-methods.html but training data is obtained from a Spark SQL RDD. The problem I'm having is that LBFGS tries to count the