Re: Append In-Place to S3

2018-06-03 Thread Tayler Lawrence Jones
Sorry actually my last message is not true for anti join, I was thinking of semi join. -TJ On Sun, Jun 3, 2018 at 14:57 Tayler Lawrence Jones wrote: > A left join with null filter is only the same as a left anti join if the > join keys can be guaranteed unique in the existing data. Sinc

Re: Append In-Place to S3

2018-06-03 Thread Tayler Lawrence Jones
On Mon, 4 Jun 2018 at 6:42 am, Tayler Lawrence Jones < > t.jonesd...@gmail.com> wrote: > >> The issue is not the append vs overwrite - perhaps those responders do >> not know Anti join semantics. Further, Overwrite on s3 is a bad pattern due >> to s3 eventual consiste

Re: Append In-Place to S3

2018-06-03 Thread Tayler Lawrence Jones
The issue is not the append vs overwrite - perhaps those responders do not know Anti join semantics. Further, Overwrite on s3 is a bad pattern due to s3 eventual consistency issues. First, your sql query is wrong as you don’t close the parenthesis of the CTE (“with” part). In fact, it looks like

Re: Writing files to s3 with out temporary directory

2017-11-20 Thread Tayler Lawrence Jones
It is an open issue with Hadoop file committer, not spark. The simple workaround is to write to hdfs then copy to s3. Netflix did a talk about their custom output committer at the last spark summit which is a clever efficient way of doing that - I’d check it out on YouTube. They have open sourced