Re: SparkR + binary type + how to get value

2019-02-19 Thread Felix Cheung
from the second image it looks like there is protocol mismatch. I’d check if the SparkR package running there on Livy machine matches the Spark java release. But in any case this seems more an issue with Livy config. I’d suggest checking with the community there:

Losing system properties on executor side, if context is checkpointed

2019-02-19 Thread Dmitry Goldenberg
Hi all, I'm seeing an odd behavior where if I switch the context from regular to checkpointed, the system properties are no longer automatically carried over into the worker / executors and turn out to be null there. This is in Java, using spark-streaming_2.10, version 1.5.0. I'm placing

Re: Difference between dataset and dataframe

2019-02-19 Thread Vadim Semenov
> > 1) Is there any difference in terms performance when we use datasets over > dataframes? Is it significant to choose 1 over other. I do realise there > would be some overhead due case classes but how significant is that? Are > there any other implications. As long as you use the DataFrame

Re: Difference between dataset and dataframe

2019-02-19 Thread Koert Kuipers
dataframe operations are expressed as transformations on columns, basically on locations inside the row objects. this specificity can be exploited by catalyst to optimize these operations. since catalyst knows exactly what positions in the row object you modified or not at any point and often also

Re: Looking for an apache spark mentor

2019-02-19 Thread Robert Kaye
> On Feb 19, 2019, at 2:26 PM, Shyam P wrote: > > What IRC channel we should join? I should’ve included info in the first place, heh. Sorry: #metabrainz on freenode, please. I am ruaok, but pristine and iliekcomputers are also very much interested in learning more about Spark. Thanks! --

Re: SparkR + binary type + how to get value

2019-02-19 Thread Thijs Haarhuis
Hi Felix, Thanks. I got it working now by using the unlist function. I have another question, maybe you can help me with, since I did see your naming popping up regarding the spark.lapply function. I am using Apache Livy and am having troubles using this function, I even reported a jira ticket

Re: Looking for an apache spark mentor

2019-02-19 Thread Shyam P
What IRC channel we should join? On Tue, 19 Feb 2019, 17:56 Robert Kaye, wrote: > Hello! > > I’m Robert Kaye from the MetaBrainz Foundation — we’re the people behind > MusicBrainz ( https://musicbrainz.org ) and more recently ListenBrainz ( > https://listenbrainz.org ). ListenBrainz is aiming

Spark on Kubernetes with persistent local storage

2019-02-19 Thread Arne Zachlod
Hello, I'm trying to host spark applications on a kubernetes cluster and want to provide localized persistent storage to the spark workers in a small research project I'm currently doing. I googled a bit around and found that HDFS seems to be pretty well supported with spark, but there arise some

Looking for an apache spark mentor

2019-02-19 Thread Robert Kaye
Hello! I’m Robert Kaye from the MetaBrainz Foundation — we’re the people behind MusicBrainz ( https://musicbrainz.org ) and more recently ListenBrainz ( https://listenbrainz.org ). ListenBrainz is aiming to re-create what last.fm