what would good spill settings be?

On Fri, Dec 12, 2014 at 2:45 PM, Sameer Farooqui <same...@databricks.com>
wrote:
>
> You could try re-partitioning or coalescing the RDD to partition and then
> write it to disk. Make sure you have good spill settings enabled so that
> the RDD can spill to the local temp dirs if it has to.
>
> On Fri, Dec 12, 2014 at 2:39 PM, Steve Lewis <lordjoe2...@gmail.com>
> wrote:
>>
>> The objective is to let the Spark application generate a file in a format
>> which can be consumed by other programs - as I said I am willing to give up
>> parallelism at this stage (all the expensive steps were earlier but do want
>> an efficient way to pass once through an RDD without the requirement to
>> hold it in memory as a list.
>>
>> On Fri, Dec 12, 2014 at 12:22 PM, Sameer Farooqui <same...@databricks.com
>> > wrote:
>>
>>> Instead of doing this on the compute side, I would just write out the
>>> file with different blocks initially into HDFS and then use "hadoop fs
>>> -getmerge" or HDFSConcat to get one final output file.
>>>
>>>
>>> - SF
>>>
>>> On Fri, Dec 12, 2014 at 11:19 AM, Steve Lewis <lordjoe2...@gmail.com>
>>> wrote:
>>>>
>>>>
>>>> I have an RDD which is potentially too large to store in memory with
>>>> collect. I want a single task to write the contents as a file to hdfs. Time
>>>> is not a large issue but memory is.
>>>> I say the following converting my RDD (scans) to a local Iterator. This
>>>> works but hasNext shows up as a separate task and takes on the order of 20
>>>> sec for a medium sized job -
>>>> is *toLocalIterator a bad function to call in this case and is there a
>>>> better one?*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *public void writeScores(final Appendable out, JavaRDD<IScoredScan> scans) 
>>>> {    writer.appendHeader(out, getApplication());    Iterator<IScoredScan> 
>>>> scanIterator = scans.toLocalIterator();    while(scanIterator.hasNext())  
>>>> {        IScoredScan scan = scanIterator.next();        
>>>> writer.appendScan(out, getApplication(), scan);    }    
>>>> writer.appendFooter(out, getApplication());}*
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Steven M. Lewis PhD
>> 4221 105th Ave NE
>> Kirkland, WA 98033
>> 206-384-1340 (cell)
>> Skype lordjoe_com
>>
>>

-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Reply via email to