[PySpark] Revisiting PySpark type annotations

2019-01-25 Thread Maciej Szymkiewicz
Hello everyone, I'd like to revisit the topic of adding PySpark type annotations in 3.0. It has been discussed before ( http://apache-spark-developers-list.1001551.n3.nabble.com/Python-friendly-API-for-Spark-3-0-td25016.html and

[PYTHON] PySpark typing hints

2017-05-14 Thread Maciej Szymkiewicz
Hi everyone, For the last few months I've been working on static type annotations for PySpark. For those of you, who are not familiar with the idea, typing hints have been introduced by PEP 484 (https://www.python.org/dev/peps/pep-0484/) and further extended with PEP 526

Re: Porting LIBSVM models to Spark

2016-11-29 Thread Maciej Szymkiewicz
Hi, Not directly. You could try some workaround with converting to PMML and importing with JPMML-Spark (but you'd have create your own Python wrapper). On a side note please avoid cross posting between Stack Overflow and user list and be sure to read the guidelines

Re: Computing hamming distance over large data set

2016-02-12 Thread Maciej Szymkiewicz
list archive at > Nabble.com. > > --------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > For additional commands, e-mail: user-h...@spark.apache.org > <mailto:user-h...@spark.apache.org> > > -- Maciej Szymkiewicz signature.asc Description: OpenPGP digital signature

Re: PySpark Broadcast of User Defined Class No Work?

2016-01-18 Thread Maciej Szymkiewicz
Python can pickle only objects not classes. It means that SimpleClass has to importable on every worker node to enable correct deserialization. Typically it means keeping class definitions in a separate module and distributing using for example --py-files. On 01/19/2016 12:34 AM, efwalkermit

Re: pyspark: conditionals inside functions

2016-01-09 Thread Maciej Szymkiewicz
On 01/09/2016 04:45 AM, Franc Carter wrote: > > Hi, > > I'm trying to write a short function that returns the last sunday of > the week of a given date, code below > > def getSunday(day): > > day = day.cast("date") > > sun = next_day(day, "Sunday") > >

PySpark order-only window function issue

2015-08-11 Thread Maciej Szymkiewicz
Hello everyone, I am trying to use PySpark API with window functions without specifying partition clause. I mean something equivalent to this SELECT v, row_number() OVER (ORDER BY v) AS rn FROM df in SQL. I am not sure if I am doing something wrong or it is a bug but results are far from what I