Re: Run a specific PySpark test or group of tests

2017-08-16 Thread Nicholas Chammas
Looks like it doesn’t take too much work to get pytest working on our code base, since it knows how to run unittest tests. https://github.com/apache/spark/compare/master…nchammas:pytest For example I was able to do this from that

Re: Questions about the future of UDTs and Encoders

2017-08-16 Thread Patrick GRANDJEAN
Hi Kathrine, I am also interested in UDTs in order to support serialization of some legacy third-party types. I have been monitoring the following JIRA issue: [SPARK-7768] Make user-defined type (UDT) API public - ASF JIRA | | | [SPARK-7768] Make user-defined type (UDT) API public - ASF

Timestamp interoperability design doc available for review

2017-08-16 Thread Zoltan Ivanfi
Dear Spark Community, Based on earlier feedback from the Spark community, we would like to suggest a short-term fix for the timestamp interoperability problem[1] between different SQL-on-Hadoop engines. I created a design document[2] and would like to ask you to review it and let me know of any

Re: SPIP: Spark on Kubernetes

2017-08-16 Thread Alexander Bezzubov
+1 (non-binding) Looking forward using it as part of Apache Spark release, instead of Standalone cluster deployed on top of k8s. -- Alex On Wed, Aug 16, 2017 at 11:11 AM, Ismaël Mejía wrote: > +1 (non-binding) > > This is something really great to have. More schedulers

Re: SPIP: Spark on Kubernetes

2017-08-16 Thread Jean-Baptiste Onofré
+1 as well. Regards JB On Aug 16, 2017, 10:12, at 10:12, "Ismaël Mejía" wrote: >+1 (non-binding) > >This is something really great to have. More schedulers and runtime >environments are a HUGE win for the Spark ecosystem. >Amazing work, Big kudos for the guys who created and

Re: Questions about the future of UDTs and Encoders

2017-08-16 Thread Erik Erlandson
I've been working on packaging some UDTs as well. I have them working in scala and pyspark, although I haven't been able to get them to serialize to parquet, which puzzles me. Although it works, I have to define UDTs under the org.apache.spark scope due to the privatization, which is a bit

Re: Questions about the future of UDTs and Encoders

2017-08-16 Thread Katherine Prevost
I'd say the quick summary of the problem is this: The encoder mechanism does not deal well with fields of case classes (you must use builtin types (including other case classes) for case class fields), and UDTs are not currently available (and never integrated well with built-in operations).

Re: SPIP: Spark on Kubernetes

2017-08-16 Thread Ismaël Mejía
+1 (non-binding) This is something really great to have. More schedulers and runtime environments are a HUGE win for the Spark ecosystem. Amazing work, Big kudos for the guys who created and continue working on this. On Wed, Aug 16, 2017 at 2:07 AM, lucas.g...@gmail.com