Re: Append In-Place to S3

2018-06-07 Thread Benjamin Kim
I tried a different tactic. I still append based on the query below, but I add another deduping step afterwards, writing to a staging directory then overwriting back. Luckily, the data is small enough for this to happen fast. Cheers, Ben > On Jun 3, 2018, at 3:02 PM, Tayler Lawrence Jones >

Reset the offsets, Kafka 0.10 and Spark

2018-06-07 Thread Guillermo Ortiz Fernández
I'm consuming data from Kafka with createDirectStream and store the offsets in Kafka ( https://spark.apache.org/docs/2.1.0/streaming-kafka-0-10-integration.html#kafka-itself ) val stream = KafkaUtils.createDirectStream[String, String]( streamingContext, PreferConsistent, Subscribe[String,

[announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread s...@draves.org
We are pleased to announce release 0.19.0 of BeakerX , a collection of extensions and kernels for Jupyter and Jupyter Lab. BeakerX now features Scala+Spark integration including GUI configuration, status, progress, interrupt, and interactive tables. We are very interested in

Re: [announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread s...@draves.org
The %%spark magic comes with BeakerX's Scala kernel, not related to Toree. On Thu, Jun 7, 2018, 8:51 PM Stephen Boesch wrote: > Assuming that the spark 2.X kernel (e.g. toree) were chosen for a given > jupyter notebook and there is a Cell 3 that contains some Spark DataFrame > operations ..

Re: If there is timestamp type data in DF, Spark 2.3 toPandas is much slower than spark 2.2.

2018-06-07 Thread Irving Duran
I haven't noticed or seen this behavior.  Have you noticed this with by testing the same dataset between versions? Thank you, Irving Duran On 06/06/2018 11:22 PM, 李斌松 wrote: > If there is timestamp type data in DF, Spark 2.3 toPandas is much > slower than spark 2.2. signature.asc

Re: [announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread Irving Duran
So would you recommend not to have Toree and BeakerX installed to avoid conflicts? Thank you, Irving Duran On 06/07/2018 07:55 PM, s...@draves.org wrote: > The %%spark magic comes with BeakerX's Scala kernel, not related to Toree. > > On Thu, Jun 7, 2018, 8:51 PM Stephen Boesch

Re: [announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread Stephen Boesch
Assuming that the spark 2.X kernel (e.g. toree) were chosen for a given jupyter notebook and there is a Cell 3 that contains some Spark DataFrame operations .. Then : - what is the relationship does the %%spark magic and the toree kernel? - how does the %%spark magic get applied to that

how to call database specific function when reading writing thru jdbc

2018-06-07 Thread Kyunam Kim
For example, in SQL Server, when reading, I want to call a built-in function: STAsText() SELECT id, shape.STAsText() FROM SpatialTable val df = _sparkSession .read .jdbc(url, "dbo.SpatialTable", props) .select("shape.STAsText()") // No, this doesn't work.

Re: [announce] BeakerX supports Scala+Spark in Jupyter

2018-06-07 Thread Scott Draves
In Jupyter, notebooks have just one kernel at a time so I am not aware of any conflicts. On Jun 7, 2018 9:09 PM, "Irving Duran" wrote: So would you recommend not to have Toree and BeakerX installed to avoid conflicts? Thank you, Irving Duran On 06/07/2018 07:55 PM, s...@draves.org wrote:

Re: Live Streamed Code Review today at 11am Pacific

2018-06-07 Thread Holden Karau
I'll be doing another one tomorrow morning at 9am pacific focused on Python + K8s support & improved JSON support - https://www.youtube.com/watch?v=Z7ZEkvNwneU & https://www.twitch.tv/events/xU90q9RGRGSOgp2LoNsf6A :) On Fri, Mar 9, 2018 at 3:54 PM, Holden Karau wrote: > If anyone wants to watch

Re: Strange codegen error for SortMergeJoin in Spark 2.2.1

2018-06-07 Thread Kazuaki Ishizaki
Thank you for reporting a problem. Would it be possible to create a JIRA entry with a small program that can reproduce this problem? Best Regards, Kazuaki Ishizaki From: Rico Bergmann To: "user@spark.apache.org" Date: 2018/06/05 19:58 Subject:Strange codegen error for

[ANNOUNCE] Apache Bahir 2.1.2 Released

2018-06-07 Thread Luciano Resende
Apache Bahir provides extensions to multiple distributed analytic platforms, extending their reach with a diversity of streaming connectors and SQL data sources. The Apache Bahir community is pleased to announce the release of Apache Bahir 2.1.2 which provides the following extensions for Apache

Fundamental Question on Spark's distribution

2018-06-07 Thread Aakash Basu
Hi all, *Query 1)* Need a serious help! I'm running feature engineering of different types on a dataset and trying to benchmark from by tweaking different types of Spark properties. I don't know where it is going wrong that a single machine is working faster than a 3 node cluster, even though,

Register UDF duration runtime

2018-06-07 Thread 杜斌
Hi , I meeting some issue when I try to read from some string coming from web service as an UDF string and register. Here is the exception. java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field