[jira] [Commented] (SPARK-6235) Address various 2G limits

2020-01-16 Thread Samuel Shepard (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017438#comment-17017438
 ] 

Samuel Shepard commented on SPARK-6235:
---

[~irashid] I followed your suggestion of looking in the user archive and [found 
an old PR |https://github.com/apache/spark/pull/17907] that tried to fix the 
PCA call itself.  It was closed, but I linked it back here. [~srowen] is also 
on the thread. I leave this comment to help direct users to a workaround as 
much to encourage a future fix.

Thanks for all you guys do.

> Address various 2G limits
> -
>
> Key: SPARK-6235
> URL: https://issues.apache.org/jira/browse/SPARK-6235
> Project: Spark
>  Issue Type: Umbrella
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: SPARK-6235_Design_V0.02.pdf
>
>
> An umbrella ticket to track the various 2G limit we have in Spark, due to the 
> use of byte arrays and ByteBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6235) Address various 2G limits

2019-12-16 Thread Samuel Shepard (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997798#comment-16997798
 ] 

Samuel Shepard commented on SPARK-6235:
---

[~irashid] I meant the former (task result > 2G) as best I understand the 
architecture. Is there a different Jira for the ML library, since it affects 
PCA, that would be more appropriate?

Thanks for the suggestions. Spark is a beautiful system with a lot of kind 
effort put into it. Computational biology has huge feature spaces all over the 
place. The two could really work well together, I think. This issue feels like 
some sort of left over from 32-bit Java, cramping Spark's style. :(

> Address various 2G limits
> -
>
> Key: SPARK-6235
> URL: https://issues.apache.org/jira/browse/SPARK-6235
> Project: Spark
>  Issue Type: Umbrella
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: SPARK-6235_Design_V0.02.pdf
>
>
> An umbrella ticket to track the various 2G limit we have in Spark, due to the 
> use of byte arrays and ByteBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6235) Address various 2G limits

2019-12-16 Thread Samuel Shepard (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995889#comment-16995889
 ] 

Samuel Shepard edited comment on SPARK-6235 at 12/16/19 4:39 PM:
-

[~tgraves] , [~irashid] 
 
 One use case could be fetching large results to the driver when computing PCA 
on large square matrices (e.g., distance matrices, similar to Classical MDS). 
This is very helpful in bioinformatics. Sorry if this already fixed past 
2.4.0...

 


was (Author: sammysheep):
One use case could be fetching large results to the driver when computing PCA 
on large square matrices (e.g., distance matrices, similar to Classical MDS). 
This is very helpful in bioinformatics. Sorry if this already fixed past 
2.4.0...

> Address various 2G limits
> -
>
> Key: SPARK-6235
> URL: https://issues.apache.org/jira/browse/SPARK-6235
> Project: Spark
>  Issue Type: Umbrella
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: SPARK-6235_Design_V0.02.pdf
>
>
> An umbrella ticket to track the various 2G limit we have in Spark, due to the 
> use of byte arrays and ByteBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6235) Address various 2G limits

2019-12-13 Thread Samuel Shepard (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995889#comment-16995889
 ] 

Samuel Shepard commented on SPARK-6235:
---

One use case could be fetching large results to the driver when computing PCA 
on large square matrices (e.g., distance matrices, similar to Classical MDS). 
This is very helpful in bioinformatics. Sorry if this already fixed past 
2.4.0...

> Address various 2G limits
> -
>
> Key: SPARK-6235
> URL: https://issues.apache.org/jira/browse/SPARK-6235
> Project: Spark
>  Issue Type: Umbrella
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: SPARK-6235_Design_V0.02.pdf
>
>
> An umbrella ticket to track the various 2G limit we have in Spark, due to the 
> use of byte arrays and ByteBuffers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org