Re: [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-10-02 Thread Jungtaek Lim
Thanks Steve to answer in detail. I was under same feeling with Chandan from the line as well: it was against my knowledge as rename operation itself in HDFS is atomic, and I didn't imagine it was for tackling object store. I learned a lot for object store from your answer. Thanks again.

Re: [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-10-02 Thread chandan prakash
Thanks a lot Steve and Jungtaek for your answers. Steve, You explained really well in depth. I understood that the existing old implementation was not correct for object store like S3. The new implementation will address that. And for better performance we should better choose a Direct Write

Re: [DISCUSS] Syntax for table DDL

2018-10-02 Thread Ryan Blue
I'd say that it was important to be compatible with Hive in the past, but that's becoming less important over time. Spark is well established with Hadoop users and I think the focus moving forward should be to make Spark more predictable as a SQL engine for people coming from more traditional

Re: [DISCUSS] Syntax for table DDL

2018-10-02 Thread Felix Cheung
I think it has been an important “selling point” that Spark is “mostly compatible“ with Hive DDL. I have see a lot of teams suffering from switching between Presto and Hive dialects. So one question I have is, we are at a point of switch from Hive compatible to ANSI SQL, say? Perhaps a more

Re: [Discuss] Datasource v2 support for Kerberos

2018-10-02 Thread Steve Loughran
On 2 Oct 2018, at 04:44, tigerquoll mailto:tigerqu...@outlook.com>> wrote: Hi Steve, I think that passing a kerberos keytab around is one of those bad ideas that is entirely appropriate to re-question every single time you come across it. It has been used already in spark when interacting with

Re: [DISCUSS] Syntax for table DDL

2018-10-02 Thread Alessandro Solimando
I agree with Ryan, a "standard" and more widely adopted syntax is usually a good idea, with possibly some slight improvements like "bulk deletion" of columns (especially because both the syntax and the semantics are clear), rather than stay with Hive syntax at any cost. I am personally following