Re: Has anyone installed the scala kernel for Jupyter notebook

2016-09-22 Thread andy petrella
heya, I'd say if you wanna go the spark and scala way, please yourself and go for the Spark Notebook (check http://spark-notebook.io/ for pre-built distro or build your own) hth On Thu, Sep 22, 2016 at 12:45 AM Arif,Mubaraka

Re: Using Zeppelin with Spark FP

2016-09-11 Thread andy petrella
Heya, probably worth giving the Spark Notebook a go then. It can plot any scala data (collection, rdd, df, ds, custom, ...), all are reactive so they can deal with any sort of incoming data. You can ask on the gitter

Re: Scala Vs Python

2016-09-02 Thread andy petrella
looking at the examples, indeed they make nonsense :D On Fri, 2 Sep 2016 16:48 Mich Talebzadeh, wrote: > Right so. We are back into religious arguments. Best of luck > > > > Dr Mich Talebzadeh > > > > LinkedIn * >

Re: spark and plot data

2016-07-23 Thread andy petrella
Heya, Might be worth checking the spark-notebook I guess, it offers custom and reactive dynamic charts (scatter, line, bar, pie, graph, radar, parallel, pivot, …) for any kind of data from an intuitive and easy Scala API (with server side, incl. spark based, sampling

Re: Spark 2.0 release date

2016-06-15 Thread andy petrella
; You should sense the tone in Mich's response. > > To my knowledge, there hasn't been an RC for the 2.0 release yet. > Once we have an RC, it goes through the normal voting process. > > FYI > > On Wed, Jun 15, 2016 at 7:38 AM, andy petrella <andy.petre...@gmail.com> > wro

Re: Spark 2.0 release date

2016-06-15 Thread andy petrella
> tomorrow lunch time Which TZ :-) → I'm working on the update of some materials that Dean Wampler and myself will give tomorrow at Scala Days (well tomorrow CEST). Hence, I'm upgrading the materials on spark

Re: [Spark 2.0.0] Structured Stream on Kafka

2016-06-14 Thread andy petrella
browse/SPARK-15406 > > > > On Tue, Jun 14, 2016 at 9:21 AM, andy petrella <andy.petre...@gmail.com> > wrote: > > Heya folks, > > > > Just wondering if there are some doc regarding using kafka directly from > the > > reader.stream? > > Has it been int

[Spark 2.0.0] Structured Stream on Kafka

2016-06-14 Thread andy petrella
Heya folks, Just wondering if there are some doc regarding using kafka directly from the reader.stream? Has it been integrated already (I mean the source)? Sorry if the answer is RTFM (but then I'd appreciate a pointer anyway^^) thanks, cheers andy -- andy

Re: Apache Flink

2016-04-17 Thread andy petrella
Just adding one thing to the mix: `that the latency for streaming data is eliminated` is insane :-D On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh wrote: > It seems that Flink argues that the latency for streaming data is > eliminated whereas with Spark RDD there

Re: Scala from Jupyter

2016-02-16 Thread andy petrella
gt; On Tue, Feb 16, 2016 at 3:37 PM, andy petrella <andy.petre...@gmail.com> > wrote: > >> Hello Alex! >> >> Rajeev is right, come over the spark notebook gitter room, you'll be >> helped by many experienced people if you have some troubles: >> https://gitter

Re: Scala from Jupyter

2016-02-16 Thread andy petrella
Hello Alex! Rajeev is right, come over the spark notebook gitter room, you'll be helped by many experienced people if you have some troubles: https://gitter.im/andypetrella/spark-notebook The spark notebook has many integrated, reactive (scala) and extendable (scala) plotting capabilities.

Re: Graph visualization tool for GraphX

2015-12-08 Thread andy petrella
Hello Lin, This is indeed a tough scenario when you have many vertices (and even worst) many edges... So two-fold answer: First, technically, there is a graph plotting support in the spark notebook (https://github.com/andypetrella/spark-notebook/ → check this notebook:

Re: StructType for oracle.sql.STRUCT

2015-11-28 Thread andy petrella
Warf... such an heavy tasks man! I'd love to follow your work on that (I've a long XP in geospatial too), is there a repo available already for that? The hard part will be to support all descendant types I guess (line, mutlilines, and so on), then creating the spatial operators. The only

Re: Spark + Jupyter (IPython Notebook)

2015-08-18 Thread andy petrella
Hey, Actually, for Scala, I'd better using https://github.com/andypetrella/spark-notebook/ It's deployed at several places like *Alibaba*, *EBI*, *Cray* and is supported by both the Scala community and the company Data Fellas. For instance, it was part of the Big Scala Pipeline training given

Re: Spark is in-memory processing, how then can Tachyon make Spark faster?

2015-08-07 Thread andy petrella
Exactly! The sharing part is used in the Spark Notebook (this one https://github.com/andypetrella/spark-notebook/blob/master/notebooks/Tachyon%20Test.snb) so we can share stuffs between notebooks which are different SparkContext (in diff JVM). OTOH, we have a project that creates micro services

Re: Re: Real-time data visualization with Zeppelin

2015-08-06 Thread andy petrella
-13 02:45:57, andy petrella andy.petre...@gmail.com wrote: Heya, You might be looking for something like this I guess: https://www.youtube.com/watch?v=kB4kRQRFAVc. The Spark-Notebook (https://github.com/andypetrella/spark-notebook/) can bring that to you actually, it uses fully reactive

Re: Real-time data visualization with Zeppelin

2015-07-12 Thread andy petrella
Heya, You might be looking for something like this I guess: https://www.youtube.com/watch?v=kB4kRQRFAVc. The Spark-Notebook (https://github.com/andypetrella/spark-notebook/) can bring that to you actually, it uses fully reactive bilateral communication streams to update data and viz, plus it

Re: Machine Learning on GraphX

2015-06-18 Thread andy petrella
I guess that belief propagation could help here (at least, I find the ideas enough similar), thus this article might be a good start : http://arxiv.org/pdf/1004.1003.pdf (it's on my todo list, hence cannot really help further ^^) On Thu, Jun 18, 2015 at 11:44 AM Timothée Rebours

Re: [Streaming] Configure executor logging on Mesos

2015-05-30 Thread andy petrella
ourselfves and relying on the mesos sheduler backend only. Unles the spark.executor.uri (or a another one) can take more than one downloadable path. my.2¢ andy On Fri, May 29, 2015 at 5:09 PM Gerard Maas gerard.m...@gmail.com wrote: Hi Tim, Thanks for the info. We (Andy Petrella

Re: Running Javascript from scala spark

2015-05-26 Thread andy petrella
Yop, why not using like you said a js engine le rhino? But then I would suggest using mapPartition instead si only one engine per partition. Probably broadcasting the script is also a good thing to do. I guess it's for add hoc transformations passed by a remote client, otherwise you could simply

Re: solr in spark

2015-04-28 Thread andy petrella
AFAIK Datastax is heavily looking at it. they have a good integration of Cassandra with it. the next was clearly to have a strong combination of the three in one of the coming releases Le mar. 28 avr. 2015 18:28, Jeetendra Gangele gangele...@gmail.com a écrit : Does anyone tried using solr

Re: Spark and accumulo

2015-04-21 Thread andy petrella
Hello Madvi, Some work has been done by @pomadchin using the spark notebook, maybe you should come on https://gitter.im/andypetrella/spark-notebook and poke him? There are some discoveries he made that might be helpful to know. Also you can poke @lossyrob from Azavea, he did that for geotrellis

Re: Spark 1.2.0 with Play/Activator

2015-04-07 Thread andy petrella
: not found Any suggestions? Thanks *From:* Manish Gupta 8 [mailto:mgupt...@sapient.com] *Sent:* Tuesday, April 07, 2015 12:04 PM *To:* andy petrella; user@spark.apache.org *Subject:* RE: Spark 1.2.0 with Play/Activator Thanks for the information Andy. I will go through the versions

Re: Processing Large Images in Spark?

2015-04-07 Thread andy petrella
Heya, You might be interesting at looking at GeoTrellis They use RDDs of Tiles to process big images like Landsat ones can be (specially 8). However, I see you have only 1G per file, so I guess you only care of a single band? Or is it a reboxed pic? Note: I think the GeoTrellis image format is

Re: Spark 1.2.0 with Play/Activator

2015-04-06 Thread andy petrella
Hello Manish, you can take a look at the spark-notebook build, it's a bit tricky to get rid of some clashes but at least you can refer to this build to have ideas. LSS, I have stripped out akka from play deps. ref: https://github.com/andypetrella/spark-notebook/blob/master/build.sbt

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread andy petrella
That purely awesome! Don't hesitate to contribute your notebook back to the spark notebook repo, even rough, I'll help cleaning up if needed. The vagrant is also appealing  Congrats! Le jeu 26 mars 2015 22:22, David Holiday dav...@annaisystems.com a écrit : w0t! that did it!

Re: Supported Notebooks (and other viz tools) for Spark 0.9.1?

2015-02-03 Thread andy petrella
Hello Adamantios, Thanks for the poke and the interest. Actually, you're the second asking about backporting it. Yesterday (late), I created a branch for it... and the simple local spark test worked! \o/. However, it'll be the 'old' UI :-/. Since I didn't ported the code using play 2.2.6 to the

Re: Supported Notebooks (and other viz tools) for Spark 0.9.1?

2015-02-03 Thread andy petrella
in this branch and launch `sbt run`. HTH, andy On Tue Feb 03 2015 at 2:45:43 PM andy petrella andy.petre...@gmail.com wrote: Hello Adamantios, Thanks for the poke and the interest. Actually, you're the second asking about backporting it. Yesterday (late), I created a branch for it... and the simple

Re: Using TF-IDF from MLlib

2014-12-29 Thread andy petrella
Here is what I did for this case : https://github.com/andypetrella/tf-idf Le lun 29 déc. 2014 11:31, Sean Owen so...@cloudera.com a écrit : Given (label, terms) you can just transform the values to a TF vector, then TF-IDF vector, with HashingTF and IDF / IDFModel. Then you can make a

Re: Discourse: A proposed alternative to the Spark User list

2014-12-25 Thread andy petrella
Nice idea, although it needs a plan on their hosting, or spark to host it if I'm not wrong. I've been using Slack for discussions, it's not exactly the same of discourse, the ML or SO but offers interesting features. It's more in the mood of IRC integrated with external services. my2c On Wed

Re: spark-repl_1.2.0 was not uploaded to central maven repository.

2014-12-21 Thread andy petrella
Actually yes, things like interactive notebooks f.i. On Sun Dec 21 2014 at 11:35:18 AM Sean Owen so...@cloudera.com wrote: I'm only speculating, but I wonder if it was on purpose? would people ever build an app against the REPL? On Sun, Dec 21, 2014 at 5:50 AM, Peng Cheng pc...@uow.edu.au

Re: Implementing a spark version of Haskell's partition

2014-12-18 Thread andy petrella
. But I think there is currently no RDD method that returns more than one RDD for a single input RDD, so maybe there is some design limitation on Spark that prevents this? Again, thanks for your answer. Greetings, Juan El 17/12/2014 18:15, andy petrella andy.petre...@gmail.com escribió: yo

Re: Implementing a spark version of Haskell's partition

2014-12-17 Thread andy petrella
yo, First, here is the scala version: http://www.scala-lang.org/api/current/index.html#scala.collection.Seq@partition(p:A= Boolean):(Repr,Repr) Second: RDD is distributed so what you'll have to do is to partition each partition each partition (:-D) or create two RDDs with by filtering twice →

Re: Is Spark the right tool for me?

2014-12-02 Thread andy petrella
this is ok? ~Ben Von: andy petrella andy.petre...@gmail.com Datum: Montag, 1. Dezember 2014 15:48 An: Benjamin Stadin benjamin.sta...@heidelberg-mobil.com, user@spark.apache.org user@spark.apache.org Betreff: Re: Is Spark the right tool for me? Indeed. However, I guess the important load

Re: Is Spark the right tool for me?

2014-12-02 Thread andy petrella
of the time (inserting all data in one transaction, taking somewhere between sub-second and 10 seconds for very large projects). It’s currently not a concern. (searching for a Kafka+Spark example now) Cheers Ben Von: andy petrella andy.petre...@gmail.com Datum: Dienstag, 2. Dezember

Re: Is Spark the right tool for me?

2014-12-01 Thread andy petrella
Not quite sure which geo processing you're doing are they raster, vector? More info will be appreciated for me to help you further. Meanwhile I can try to give some hints, for instance, did you considered GeoMesa http://www.geomesa.org/2014/08/05/spark/? Since you need a WMS (or alike), did you

Re: Is Spark the right tool for me?

2014-12-01 Thread andy petrella
://www.deep-map.com Von: andy petrella andy.petre...@gmail.com Datum: Montag, 1. Dezember 2014 15:07 An: Benjamin Stadin benjamin.sta...@heidelberg-mobil.com, user@spark.apache.org user@spark.apache.org Betreff: Re: Is Spark the right tool for me? Not quite sure which geo processing you're

Re: Parsing a large XML file using Spark

2014-11-21 Thread andy petrella
Actually, it's a real On Tue Nov 18 2014 at 2:52:00 AM Tobias Pfeiffer t...@preferred.jp wrote: Hi, see https://www.mail-archive.com/dev@spark.apache.org/msg03520.html for one solution. One issue with those XML files is that they cannot be processed line by line in parallel; plus you

Re: Parsing a large XML file using Spark

2014-11-21 Thread andy petrella
(sorry about the previous spam... google inbox didn't allowed me to cancel the miserable sent action :-/) So what I was about to say: it's a real PAIN tin the ass to parse the wikipedia articles in the dump due to this mulitline articles... However, there is a way to manage that quite easily,

Re: Using TF-IDF from MLlib

2014-11-21 Thread andy petrella
Yeah, I initially used zip but I was wondering how reliable it is. I mean, it's the order guaranteed? What if some mode fail, and the data is pulled out from different nodes? And even if it can work, I found this implicit semantic quite uncomfortable, don't you? My0.2c Le ven 21 nov. 2014 15:26,

Re: Getting spark job progress programmatically

2014-11-20 Thread andy petrella
, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I have for now submitted a JIRA ticket @ https://issues.apache.org/jira/browse/SPARK-4473. I will collate all my experiences ( hacks) and submit them as a feature request for public API. On Tue Nov 18 2014 at 20:35:00 andy petrella

[GraphX] Mining GeoData (OSM)

2014-11-20 Thread andy petrella
Guys, After talking with Ankur, it turned out that sharing the talk we gave at ScalaIO (France) would be worthy. So there you go, and don't hesitate to share your thoughts ;-)/ http://www.slideshare.net/noootsab/machine-learning-and-graphx Greetz, andy

Re: Using TF-IDF from MLlib

2014-11-20 Thread andy petrella
/Someone will correct me if I'm wrong./ Actually, TF-IDF scores terms for a given document, an specifically TF. Internally, these things are holding a Vector (hopefully sparsed) representing all the possible words (up to 2²⁰) per document. So each document afer applying TF, will be transformed in

Re: Getting spark job progress programmatically

2014-11-18 Thread andy petrella
I started some quick hack for that in the notebook, you can head to: https://github.com/andypetrella/spark-notebook/blob/master/common/src/main/scala/notebook/front/widgets/SparkInfo.scala On Tue Nov 18 2014 at 2:44:48 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: I am writing yet

Re: Getting spark job progress programmatically

2014-11-18 Thread andy petrella
to group Id mapping. I will submit a JIRA ticket and seek spark dev's opinion on this. Many thanks for your prompt help Andy. Thanks, Aniket On Tue Nov 18 2014 at 19:40:06 andy petrella andy.petre...@gmail.com wrote: I started some quick hack for that in the notebook, you can head to: https

Re: Scala Spark IDE help

2014-10-28 Thread andy petrella
Also, I'm following to master students at the University of Liège (one for computing prob conditional density on massive data and the other implementing a Markov Chain method on georasters), I proposed them to use the Spark-Notebook to learn the framework, they're quite happy with it (so far at

Re: Interactive interface tool for spark

2014-10-12 Thread andy petrella
, 2014 at 4:57 PM, Michael Allman mich...@videoamp.com wrote: Hi Andy, This sounds awesome. Please keep us posted. Meanwhile, can you share a link to your project? I wasn't able to find it. Cheers, Michael On Oct 8, 2014, at 3:38 AM, andy petrella andy.petre...@gmail.com wrote: Heya You

Re: Interactive interface tool for spark

2014-10-12 Thread andy petrella
: And what about Hue http://gethue.com ? On Sun, Oct 12, 2014 at 1:26 PM, andy petrella andy.petre...@gmail.com wrote: Dear Sparkers, As promised, I've just updated the repo with a new name (for the sake of clarity), default branch but specially with a dedicated README containing: * explanations

Re: Interactive interface tool for spark

2014-10-09 Thread andy petrella
approaches for visualization there? Thanks. Kelvin On Wed, Oct 8, 2014 at 9:14 AM, andy petrella andy.petre...@gmail.com wrote: Sure! I'll post updates as well in the ML :-) I'm doing it on twitter for now (until doc is ready). The repo is there (branch spark) : https://github.com/andypetrella

Re: Interactive interface tool for spark

2014-10-08 Thread andy petrella
. Please keep us posted. Meanwhile, can you share a link to your project? I wasn't able to find it. Cheers, Michael On Oct 8, 2014, at 3:38 AM, andy petrella andy.petre...@gmail.com wrote: Heya You can check Zeppellin or my fork of the Scala notebook. I'm going this week end to push some

Re: GraphX: Types for the Nodes and Edges

2014-10-01 Thread andy petrella
I'll try my best ;-). 1/ you could create a abstract type for the types (1 on top of Vs, 1 other on top of Es types) than use the subclasses as payload in your VertexRDD or in your Edge. Regarding storage and files, it doesn't really matter (unless you want to use the OOTB loading method, thus

Re: REPL like interface for Spark

2014-09-29 Thread andy petrella
Heya, I started to port the scala-notebook to Spark some weeks ago (but doing it in my sparse time and for my Spark talks ^^). It's a WIP but works quite fine ftm, you can check my fork and branch over here: https://github.com/andypetrella/scala-notebook/tree/spark Feel free to ask any

Re: REPL like interface for Spark

2014-09-29 Thread andy petrella
, Sep 29, 2014 at 5:27 PM, andy petrella andy.petre...@gmail.com wrote: Heya, I started to port the scala-notebook to Spark some weeks ago (but doing it in my sparse time and for my Spark talks ^^). It's a WIP but works quite fine ftm, you can check my fork and branch over here: https

Re: REPL like interface for Spark

2014-09-29 Thread andy petrella
However (I must say ^^) that it's funny that it has been build using usual plain old Java stuffs :-D. aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] http://about.me/noootsab On Mon, Sep 29, 2014 at 10:51 AM, andy petrella andy.petre...@gmail.com wrote: Cool!!! I'll give

Re: Example of Geoprocessing with Spark

2014-09-20 Thread andy petrella
It's probably slw as you say because it's actually also doing the map phase that will do the RTree search and so on, and only then saving to hdfs on 60 partition. If you want to see the time spent in saving to hdfs, you could do a count for instance before saving. Also saving from 60 partition

Re: Serving data

2014-09-15 Thread andy petrella
not quick enough I’ll go the usual route with either read-only or normal database. On 13.09.2014, at 12:45, andy petrella andy.petre...@gmail.com wrote: however, the cache is not guaranteed to remain, if other jobs are launched in the cluster and require more memory than what's left

Re: Serving data

2014-09-13 Thread andy petrella
however, the cache is not guaranteed to remain, if other jobs are launched in the cluster and require more memory than what's left in the overall caching memory, previous RDDs will be discarded. Using an off heap cache like tachyon as a dump repo can help. In general, I'd say that using a

Re: New sbt plugin to deploy jobs to EC2

2014-09-05 Thread andy petrella
\o/ = will test it soon or sooner, gr8 idea btw aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] http://about.me/noootsab On Fri, Sep 5, 2014 at 12:37 PM, Felix Garcia Borrego fborr...@gilt.com wrote: As far as I know in other to deploy and execute jobs in EC2 you need to

Re: creating a distributed index

2014-08-01 Thread andy petrella
Hey, There is some work that started on IndexedRDD (on master I think). Meanwhile, checking what has been done in GraphX regarding vertex index in partitions could be worthwhile I guess Hth Andy Le 1 août 2014 22:50, Philip Ogren philip.og...@oracle.com a écrit : Suppose I want to take my large

Re: iScala or Scala-notebook

2014-07-29 Thread andy petrella
Some people started some work on that topic using the notebook (the original or the n8han one, cannot remember)... Some issues have ben created already ^^ Le 29 juil. 2014 19:59, Nick Pentreath nick.pentre...@gmail.com a écrit : IScala itself seems to be a bit dead unfortunately. I did come

Re: Debugging Task not serializable

2014-07-28 Thread andy petrella
Also check the guides for the JVM option that prints messages for such problems. Sorry, sent from phone and don't know it by heart :/ Le 28 juil. 2014 18:44, Akhil Das ak...@sigmoidanalytics.com a écrit : A quick fix would be to implement java.io.Serializable in those classes which are causing

Re: relationship of RDD[Array[String]] to Array[Array[String]]

2014-07-21 Thread andy petrella
heya, Without a bit of gymnastic at the type level, nope. Actually RDD doesn't share any functions with the scala lib (the simple reason I could see is that the Spark's ones are lazy, the default implementations in Scala aren't). However, it'd be possible by implementing an implicit converter

Re: Generic Interface between RDD and DStream

2014-07-11 Thread andy petrella
A while ago, I wrote this: ``` package com.virdata.core.compute.common.api import org.apache.spark.rdd.RDD import org.apache.spark.SparkContext import org.apache.spark.streaming.dstream.DStream import org.apache.spark.streaming.StreamingContext sealed trait SparkEnvironment extends Serializable

Re: Spark and RDF

2014-06-20 Thread andy petrella
is shifting to SparkSQL it would be slightly hard but much better effort would be to shift Gremlin to Spark (though a much beefier one :) ) Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Fri, Jun 20, 2014 at 3:39 PM, andy

Re: Use SparkListener to get overall progress of an action

2014-05-22 Thread andy petrella
SparkListener offers good stuffs. But I also completed it with another metrics stuffs on my own that use Akka to aggregate metrics from anywhere I'd like to collect them (without any deps on ganglia yet on Codahale). However, this was useful to gather some custom metrics (from within the tasks

Re: Use SparkListener to get overall progress of an action

2014-05-22 Thread andy petrella
help! Cheers! *Pierre Borckmans* Software team *Real**Impact* Analytics *| *Brussels Office www.realimpactanalytics.com *| *[hidden email]http://user/SendEmail.jtp?type=nodenode=6259i=0 *FR *+32 485 91 87 31 *| **Skype* pierre.borckmans On 22 May 2014, at 16:58, andy petrella

Re: Spark 0.9.1 core dumps on Mesos 0.18.0

2014-04-17 Thread andy petrella
someone from the Spark team to step in with more clear and strong advices... kr, Andy Petrella Belgium (Liège) * * Data Engineer in *NextLab http://nextlab.be/ sprl* (owner) Engaged Citizen Coder for *WAJUG http://wajug.be/* (co-founder) Author of *Learning Play! Framework 2

Re: Spark 0.9.1 core dumps on Mesos 0.18.0

2014-04-17 Thread andy petrella
No of course, but I was guessing some native libs imported (to communicate with Mesos) in the project that... could miserably crash the JVM. Anyway, so you tell us that using this oracle version, you don't have any issues when using spark on mesos 0.18.0, that's interesting 'cause AFAIR, my last

Re: Spark 0.9.1 core dumps on Mesos 0.18.0

2014-04-17 Thread andy petrella
-- *From:* andy petrella [andy.petre...@gmail.com] *Sent:* Thursday, April 17, 2014 3:21 PM *To:* user@spark.apache.org *Subject:* Re: Spark 0.9.1 core dumps on Mesos 0.18.0 No of course, but I was guessing some native libs imported (to communicate with Mesos) in the project that... could

Re: ETL for postgres to hadoop

2014-04-08 Thread andy petrella
for Oracle Spatial cartridge would be great as well :-P. my2c, Andy Petrella Belgium (Liège) * * Data Engineer in *NextLab http://nextlab.be/ sprl* (owner) Engaged Citizen Coder for *WAJUG http://wajug.be/* (co-founder) Author of *Learning Play! Framework 2 http

Re: what does SPARK_EXECUTOR_URI in spark-env.sh do ?

2014-04-03 Thread andy petrella
Indeed, it's how mesos works actually. So the tarball just has to be somewhere accessible by the mesos slaves. That's why it is often put in hdfs. Le 3 avr. 2014 18:46, felix cnwe...@gmail.com a écrit : So, if I set this parameter, there is no need to copy the spark tarball to every mesos

Re: ActorNotFound problem for mesos driver

2014-04-02 Thread andy petrella
Heya, Yep this is a problem in the Mesos scheduler implementation that has been fixed after 0.9.0 (https://spark-project.atlassian.net/browse/SPARK-1052 = MesosSchedulerBackend) So several options, like applying the patch, upgrading to 0.9.1 :-/ Cheers, Andy On Wed, Apr 2, 2014 at 5:30 PM,

Re: ActorNotFound problem for mesos driver

2014-04-02 Thread andy petrella
np ;-) On Wed, Apr 2, 2014 at 5:50 PM, Leon Zhang leonca...@gmail.com wrote: Aha, thank you for your kind reply. Upgrading to 0.9.1 is a good choice. :) On Wed, Apr 2, 2014 at 11:35 PM, andy petrella andy.petre...@gmail.comwrote: Heya, Yep this is a problem in the Mesos scheduler

Re: Need suggestions

2014-04-02 Thread andy petrella
TL;DR Your classes are missing on the workers, pass the jar containing the class main.scala.Utils to the SparkContext Longer: I miss some information, like how the SparkContext is configured but my best guess is that you didn't provided the jars (addJars on SparkConf or use the SC's constructor

Re: Need suggestions

2014-04-02 Thread andy petrella
Sorry I was not clear perhaps, anyway, could you try with the path in the *List* to be the absolute one; e.g. List(/home/yh/src/pj/spark-stuffs/target/scala-2.10/simple-project_2.10-1.0.jar) In order to provide a relative path, you need first to figure out your CWD, so you can do (to be really

Re: Announcing Spark SQL

2014-03-27 Thread andy petrella
nope (what I said :-P) On Thu, Mar 27, 2014 at 11:05 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: On Thu, Mar 27, 2014 at 10:22 AM, andy petrella andy.petre...@gmail.comwrote: I just mean queries sent at runtime ^^, like for any RDBMS. In our project we have

Re: Announcing Spark SQL

2014-03-27 Thread andy petrella
Original message From: andy petrella Date:03/27/2014 6:08 AM (GMT-05:00) To: user@spark.apache.org Subject: Re: Announcing Spark SQL nope (what I said :-P) On Thu, Mar 27, 2014 at 11:05 AM, Pascal Voitot Dev pascal.voitot@gmail.com wrote: On Thu, Mar 27, 2014 at 10:22 AM

Re: [re-cont] map and flatMap

2014-03-15 Thread andy petrella
Dev pascal.voitot@gmail.com wrote: On Wed, Mar 12, 2014 at 3:06 PM, andy petrella andy.petre...@gmail.com wrote: Folks, I want just to pint something out... I didn't had time yet to sort it out and to think enough to give valuable strict explanation of -- event though, intuitively