Re: Live Streamed Code Review today at 11am Pacific

2018-06-07 Thread Holden Karau
I'll be doing another one tomorrow morning at 9am pacific focused on Python + K8s support & improved JSON support - https://www.youtube.com/watch?v=Z7ZEkvNwneU & https://www.twitch.tv/events/xU90q9RGRGSOgp2LoNsf6A :) On Fri, Mar 9, 2018 at 3:54 PM, Holden Karau wrote: > If anyone wants to watch

Re: [announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread Scott Draves
In Jupyter, notebooks have just one kernel at a time so I am not aware of any conflicts. On Jun 7, 2018 9:09 PM, "Irving Duran" wrote: So would you recommend not to have Toree and BeakerX installed to avoid conflicts? Thank you, Irving Duran On 06/07/2018 07:55 PM, s...@draves.org wrote:

Re: [announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread Irving Duran
So would you recommend not to have Toree and BeakerX installed to avoid conflicts? Thank you, Irving Duran On 06/07/2018 07:55 PM, s...@draves.org wrote: > The %%spark magic comes with BeakerX's Scala kernel, not related to Toree. > > On Thu, Jun 7, 2018, 8:51 PM Stephen Boesch

how to call database specific function when reading writing thru jdbc

2018-06-07 Thread Kyunam Kim
For example, in SQL Server, when reading, I want to call a built-in function: STAsText() SELECT id, shape.STAsText() FROM SpatialTable val df = _sparkSession .read .jdbc(url, "dbo.SpatialTable", props) .select("shape.STAsText()") // No, this doesn't work.

Re: If there is timestamp type data in DF, Spark 2.3 toPandas is much slower than spark 2.2.

2018-06-07 Thread Irving Duran
I haven't noticed or seen this behavior.  Have you noticed this with by testing the same dataset between versions? Thank you, Irving Duran On 06/06/2018 11:22 PM, 李斌松 wrote: > If there is timestamp type data in DF, Spark 2.3 toPandas is much > slower than spark 2.2. signature.asc

Re: [announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread s...@draves.org
The %%spark magic comes with BeakerX's Scala kernel, not related to Toree. On Thu, Jun 7, 2018, 8:51 PM Stephen Boesch wrote: > Assuming that the spark 2.X kernel (e.g. toree) were chosen for a given > jupyter notebook and there is a Cell 3 that contains some Spark DataFrame > operations ..

Re: [announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread Stephen Boesch
Assuming that the spark 2.X kernel (e.g. toree) were chosen for a given jupyter notebook and there is a Cell 3 that contains some Spark DataFrame operations .. Then : - what is the relationship does the %%spark magic and the toree kernel? - how does the %%spark magic get applied to that

[announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread s...@draves.org
We are pleased to announce release 0.19.0 of BeakerX , a collection of extensions and kernels for Jupyter and Jupyter Lab. BeakerX now features Scala+Spark integration including GUI configuration, status, progress, interrupt, and interactive tables. We are very interested in

Reset the offsets, Kafka 0.10 and Spark

2018-06-07 Thread Guillermo Ortiz Fernández
I'm consuming data from Kafka with createDirectStream and store the offsets in Kafka ( https://spark.apache.org/docs/2.1.0/streaming-kafka-0-10-integration.html#kafka-itself ) val stream = KafkaUtils.createDirectStream[String, String]( streamingContext, PreferConsistent, Subscribe[String,

Re: Append In-Place to S3

2018-06-07 Thread Benjamin Kim
I tried a different tactic. I still append based on the query below, but I add another deduping step afterwards, writing to a staging directory then overwriting back. Luckily, the data is small enough for this to happen fast. Cheers, Ben > On Jun 3, 2018, at 3:02 PM, Tayler Lawrence Jones >

Register UDF duration runtime

2018-06-07 Thread 杜斌
Hi , I meeting some issue when I try to read from some string coming from web service as an UDF string and register. Here is the exception. java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field

Fundamental Question on Spark's distribution

2018-06-07 Thread Aakash Basu
Hi all, *Query 1)* Need a serious help! I'm running feature engineering of different types on a dataset and trying to benchmark from by tweaking different types of Spark properties. I don't know where it is going wrong that a single machine is working faster than a 3 node cluster, even though,

[ANNOUNCE] Apache Bahir 2.1.2 Released

2018-06-07 Thread Luciano Resende
Apache Bahir provides extensions to multiple distributed analytic platforms, extending their reach with a diversity of streaming connectors and SQL data sources. The Apache Bahir community is pleased to announce the release of Apache Bahir 2.1.2 which provides the following extensions for Apache

Re: Strange codegen error for SortMergeJoin in Spark 2.2.1

2018-06-07 Thread Kazuaki Ishizaki
Thank you for reporting a problem. Would it be possible to create a JIRA entry with a small program that can reproduce this problem? Best Regards, Kazuaki Ishizaki From: Rico Bergmann To: "user@spark.apache.org" Date: 2018/06/05 19:58 Subject:Strange codegen error for