[GitHub] spark pull request #21086: [SPARK-24002] [SQL] Task not serializable caused ...

2018-05-16 Thread ghoto
Github user ghoto commented on a diff in the pull request: https://github.com/apache/spark/pull/21086#discussion_r188701408 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -351,12 +338,26 @@ class

[GitHub] spark pull request #21086: [SPARK-24002] [SQL] Task not serializable caused ...

2018-05-15 Thread ghoto
Github user ghoto commented on a diff in the pull request: https://github.com/apache/spark/pull/21086#discussion_r188473831 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -351,12 +338,26 @@ class

[GitHub] spark issue #21086: [SPARK-24002] [SQL] Task not serializable caused by org....

2018-05-15 Thread ghoto
Github user ghoto commented on the issue: https://github.com/apache/spark/pull/21086 I'm hitting this issue after upgrading from 2.0.2 to 2.3.0. Please backport this PR to Spark 2.3.0 --- - To unsubscribe, e-mail

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-20 Thread ghoto
Github user ghoto commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r117619161 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -992,7 +992,16 @@ object Matrices { new DenseMatrix(dm.rows

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-19 Thread ghoto
Github user ghoto commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r117535893 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -992,7 +992,24 @@ object Matrices { new DenseMatrix(dm.rows

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-11 Thread ghoto
Github user ghoto commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r116155652 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -992,7 +992,20 @@ object Matrices { new DenseMatrix(dm.rows

[GitHub] spark issue #17907: SPARK-7856 Principal components and variance using compu...

2017-05-11 Thread ghoto
Github user ghoto commented on the issue: https://github.com/apache/spark/pull/17907 I think I need sometime to run benchmarks. Originally the driver was set to 3GB, but since I was having this OutOfMemory in the driver I decided to give a try and increase the size

[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

2017-05-11 Thread ghoto
Github user ghoto commented on the issue: https://github.com/apache/spark/pull/17940 Need to fix line in the test because it's too long. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze may crash...

2017-05-11 Thread ghoto
Github user ghoto commented on the issue: https://github.com/apache/spark/pull/17940 Sorry about that. I added more context in the description and updated the title. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #17907: SPARK-7856 Principal components and variance using compu...

2017-05-11 Thread ghoto
Github user ghoto commented on the issue: https://github.com/apache/spark/pull/17907 With classic Spark PCA, approx. 55Kx15K matrix and 10GB in driver I go out of memory. I chopped the matrix to be 55Kx3K and I can get the PCA. With the SVD distributed approach I could compute PCA

[GitHub] spark pull request #17940: Bug fix/spark 20687

2017-05-10 Thread ghoto
GitHub user ghoto opened a pull request: https://github.com/apache/spark/pull/17940 Bug fix/spark 20687 ## What changes were proposed in this pull request? Bugfix for https://issues.apache.org/jira/browse/SPARK-20687 Before converting a CSCMatrix to a Matrix

[GitHub] spark issue #17907: SPARK-7856 Principal components and variance using compu...

2017-05-09 Thread ghoto
Github user ghoto commented on the issue: https://github.com/apache/spark/pull/17907 My understanding is that the RowMatrix computes the SVD locally when the data is suitable to improve performance, and distributed otherwise. Then, the suggested implementation NOT always relies

[GitHub] spark pull request #17907: SPARK-7856 Principal components and variance usin...

2017-05-09 Thread ghoto
Github user ghoto commented on a diff in the pull request: https://github.com/apache/spark/pull/17907#discussion_r115502692 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -384,19 +384,23 @@ class RowMatrix @Since("

[GitHub] spark pull request #17907: SPARK-7856 Principal components and variance usin...

2017-05-08 Thread ghoto
GitHub user ghoto opened a pull request: https://github.com/apache/spark/pull/17907 SPARK-7856 Principal components and variance using computeSVD() ## What changes were proposed in this pull request? The current implementation