Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-11-08 Thread Shane Knapp
so... i've updated the python testing framework to run on python 3.6, and drop support for testing against python 2.7 (and pypy 2.5.1). https://github.com/apache/spark/pull/26330 anyways: this is ready to merge, but i'm more than happy to hold off for the time being if we're still a little shy

Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
I remember merging PRs with non-ascii chars in the past... Anyway, for these scripts, might be easier to just use python3 for everything, instead of trying to keep them working on two different versions. On Fri, Nov 8, 2019 at 10:28 AM Sean Owen wrote: > > Ah OK. I think it's the same type of

Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Sean Owen
Ah OK. I think it's the same type of issue that the last change actually was trying to fix for Python 2. Here it seems like the author name might have non-ASCII chars? I don't immediately know enough to know how to resolve that for Python 2. Something with how raw_input works, I take it. You could

Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
Something related to non-ASCII characters. Worked fine with python 3. git branch -D PR_TOOL_MERGE_PR_26426_MASTER Traceback (most recent call last): File "./dev/merge_spark_pr.py", line 577, in main() File "./dev/merge_spark_pr.py", line 552, in main merge_hash = merge_pr(pr_num,

Re: dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Sean Owen
Hm, the last change was on Oct 1, and should have actually helped it still work with Python 2: https://github.com/apache/spark/commit/2ec3265ae76fc1e136e44c240c476ce572b679df#diff-c321b6c82ebb21d8fd225abea9b7b74c Hasn't otherwise changed in a while. What's the error? On Fri, Nov 8, 2019 at 11:37

dev/merge_spark_pr.py broken on python 2

2019-11-08 Thread Marcelo Vanzin
Hey all, Something broke that script when running with python 2. I know we want to deprecate python 2, but in that case, scripts should at least be changed to use "python3" in the shebang line... -- Marcelo - To unsubscribe

Re: Build customized resource manager

2019-11-08 Thread Tom Graves
I don't know if it all works but some work was done to make cluster manager pluggable, see SPARK-13904. Tom On Wednesday, November 6, 2019, 07:22:59 PM CST, Klaus Ma wrote: Any suggestions? - Klaus On Mon, Nov 4, 2019 at 5:04 PM Klaus Ma wrote: Hi team, AFAIK, we built

Re: [DISCUSS] writing structured streaming dataframe to custom S3 buckets?

2019-11-08 Thread Steve Loughran
> spark.sparkContext.hadoopConfiguration.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") This is some superstition which seems to get carried through stack overflow articles. You do not need to declare the implementation class for s3a:// any more than you have to do for

Re: [DISCUSS] Expensive deterministic UDFs

2019-11-08 Thread Enrico Minack
I agree that 'non-deterministic' is the right term for what it currently does: mark an expression as non-deterministic (returns different values for the same input, e.g. rand()), and the optimizer does its best to not break semantics when moving expressions around. In case of expensive