Contribution help needed for sub-tasks of an umbrella JIRA - port *.sql tests to improve coverage of Python, Pandas, Scala UDF cases

2019-07-08 Thread Hyukjin Kwon
Hi all, I am currently targeting to improve Python, Pandas UDFs Scala UDF test cases by integrating our existing *.sql files at https://issues.apache.org/jira/browse/SPARK-27921 I would appreciate that anyone who's interested in Spark contribution takes some sub-tasks. It's too many for me to do

disable checkpointing in structured streaming

2019-07-08 Thread Charles vinodh
Hi , is it possible to disable checkpointing in structured streaming and have it replaced by our own checkpointing implementation where the offsets are saved in an external database?.. I looked up the docs and it seems this is supported on spark DStream streaming but not in structured streaming

Re: Opinions wanted: how much to match PostgreSQL semantics?

2019-07-08 Thread Marco Gaido
Hi Sean, Thanks for bringing this up. Honestly, my opinion is that Spark should be fully ANSI SQL compliant. Where ANSI SQL compliance is not an issue, I am fine following any other DB. IMHO, we won't get anyway 100% compliance with any DB - postgres in this case (e.g. for decimal operations, we

Opinions wanted: how much to match PostgreSQL semantics?

2019-07-08 Thread Sean Owen
See the particular issue / question at https://github.com/apache/spark/pull/24872#issuecomment-509108532 and the larger umbrella at https://issues.apache.org/jira/browse/SPARK-27764 -- Dongjoon rightly suggests this is a broader question.