[ 
https://issues.apache.org/jira/browse/SPARK-6190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407729#comment-16407729
 ] 

sam commented on SPARK-6190:
----------------------------

[~bdolbeare] [~UZiVcbfPXaNrMtT]

I completely agree that it's depressing many of these basic Spark issues are 
not resolved and development attention seems to go on useless features purely 
to make it easier to market Spark to non-big-data experts (e.g. SparkSQL, or 
poorly designed non-functional APIs like Dataframes/Datasets).

Nevertheless in the situation where you have a prod job where the input data 
grows over time, you could have a pre-job that determines the size of the input 
data, and then calculates optimal partitioning.  One may even need to consider 
automatically increasing the size of cluster too.

It's very normal for Big Data jobs to work for months then start falling over.  
Trust me it used to be much worse in the Hadoop days.

 

> create LargeByteBuffer abstraction for eliminating 2GB limit on blocks
> ----------------------------------------------------------------------
>
>                 Key: SPARK-6190
>                 URL: https://issues.apache.org/jira/browse/SPARK-6190
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>            Reporter: Imran Rashid
>            Assignee: Imran Rashid
>            Priority: Major
>         Attachments: LargeByteBuffer_v3.pdf
>
>
> A key component in eliminating the 2GB limit on blocks is creating a proper 
> abstraction for storing more than 2GB.  Currently spark is limited by a 
> reliance on nio ByteBuffer and netty ByteBuf, both of which are limited at 
> 2GB.  This task will introduce the new abstraction and the relevant 
> implementation and utilities, without effecting the existing implementation 
> at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to