[jira] [Commented] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017438#comment-17017438 ] Samuel Shepard commented on SPARK-6235: --- [~irashid] I followed your suggestion of looking in the user archive and [found an old PR |https://github.com/apache/spark/pull/17907] that tried to fix the PCA call itself. It was closed, but I linked it back here. [~srowen] is also on the thread. I leave this comment to help direct users to a workaround as much to encourage a future fix. Thanks for all you guys do. > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin >Priority: Major > Fix For: 2.4.0 > > Attachments: SPARK-6235_Design_V0.02.pdf > > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997798#comment-16997798 ] Samuel Shepard commented on SPARK-6235: --- [~irashid] I meant the former (task result > 2G) as best I understand the architecture. Is there a different Jira for the ML library, since it affects PCA, that would be more appropriate? Thanks for the suggestions. Spark is a beautiful system with a lot of kind effort put into it. Computational biology has huge feature spaces all over the place. The two could really work well together, I think. This issue feels like some sort of left over from 32-bit Java, cramping Spark's style. :( > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin >Priority: Major > Fix For: 2.4.0 > > Attachments: SPARK-6235_Design_V0.02.pdf > > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995889#comment-16995889 ] Samuel Shepard edited comment on SPARK-6235 at 12/16/19 4:39 PM: - [~tgraves] , [~irashid] One use case could be fetching large results to the driver when computing PCA on large square matrices (e.g., distance matrices, similar to Classical MDS). This is very helpful in bioinformatics. Sorry if this already fixed past 2.4.0... was (Author: sammysheep): One use case could be fetching large results to the driver when computing PCA on large square matrices (e.g., distance matrices, similar to Classical MDS). This is very helpful in bioinformatics. Sorry if this already fixed past 2.4.0... > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin >Priority: Major > Fix For: 2.4.0 > > Attachments: SPARK-6235_Design_V0.02.pdf > > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6235) Address various 2G limits
[ https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995889#comment-16995889 ] Samuel Shepard commented on SPARK-6235: --- One use case could be fetching large results to the driver when computing PCA on large square matrices (e.g., distance matrices, similar to Classical MDS). This is very helpful in bioinformatics. Sorry if this already fixed past 2.4.0... > Address various 2G limits > - > > Key: SPARK-6235 > URL: https://issues.apache.org/jira/browse/SPARK-6235 > Project: Spark > Issue Type: Umbrella > Components: Shuffle, Spark Core >Reporter: Reynold Xin >Priority: Major > Fix For: 2.4.0 > > Attachments: SPARK-6235_Design_V0.02.pdf > > > An umbrella ticket to track the various 2G limit we have in Spark, due to the > use of byte arrays and ByteBuffers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org