Just copy the files? it shouldn't matter that much where they are as
you can find them easily. Or consider somehow sending the batches of
data straight into Redshift? no idea how that is done but I imagine
it's doable.

On Thu, Apr 16, 2015 at 6:38 PM, Vadim Bichutskiy
<vadim.bichuts...@gmail.com> wrote:
> Thanks Sean. I want to load each batch into Redshift. What's the best/most 
> efficient way to do that?
>
> Vadim
>
>
>> On Apr 16, 2015, at 1:35 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> You can't, since that's how it's designed to work. Batches are saved
>> in different "files", which are really directories containing
>> partitions, as is common in Hadoop. You can move them later, or just
>> read them where they are.
>>
>> On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy
>> <vadim.bichuts...@gmail.com> wrote:
>>> I am using Spark Streaming where during each micro-batch I output data to S3
>>> using
>>> saveAsTextFile. Right now each batch of data is put into its own directory
>>> containing
>>> 2 objects, "_SUCCESS" and "part-00000."
>>>
>>> How do I output each batch into a common directory?
>>>
>>> Thanks,
>>> Vadim
>>> ᐧ

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to