Re: time for Apache Spark 3.0?

2018-06-15 Thread Andy
/ BigDL / ……)* 3.0 is a very important version for an good open source project. It should be better to drift away the historical burden and *focus in new area*. Spark has been widely used all over the world as a successful big data framework. And it can be better than that. *Andy* On Thu, Apr 5

Re: Question on Spark's graph libraries roadmap

2017-03-14 Thread Andy
GraphFrame is just a Graph Analytics/Query Engine, not a Graph Engine which GraphX used to be. And I'm sorry to say, it doesn’t fit most scenarioes at all in fact. Enzo, I don’t think there is any roadmap of Graph libraries for Spark for now. *Andy* On Tue, Mar 14, 2017 at 7:28 AM, Tim Hunter

Re: How to hint Spark to use HashAggregate() for UDAF

2017-01-09 Thread Andy Dang
aggregate, but I could be missing something here :). I could achieve hash-based aggregate by turning this query to RDD mode, but that is counter intuitive IMO. --- Regards, Andy On Mon, Jan 9, 2017 at 2:05 PM, Takeshi Yamamuro <linguin@gmail.com> wrote: > Hi, > > Spark a

How to hint Spark to use HashAggregate() for UDAF

2017-01-09 Thread Andy Dang
,value#4L] How can I make Spark to use HashAggregate (like the count(*) expression) instead of SortAggregate with my UDAF? Is it intentional? Is there an issue tracking this? --- Regards, Andy

Re: Converting an InternalRow to a Row

2017-01-07 Thread Andy Dang
Ah, I missed that bit of documentation my bad :). That totally explains the behavior! Thanks a lot! --- Regards, Andy On Sat, Jan 7, 2017 at 10:11 AM, Liang-Chi Hsieh <vii...@gmail.com> wrote: > > Hi Andy, > > Thanks for sharing the code snippet. > > I am not su

Re: Converting an InternalRow to a Row

2017-01-06 Thread Andy Dang
, Andy On Fri, Jan 6, 2017 at 3:48 AM, Liang-Chi Hsieh <vii...@gmail.com> wrote: > > Can you show how you use the encoder in your UDAF? > > > Andy Dang wrote > > One more question about the behavior of ExpressionEncoder > > > > . > > > > I have a UD

Re: Converting an InternalRow to a Row

2017-01-05 Thread Andy Dang
= RowEncoder.apply(schema).resolveAndBind(ScalaUtils.scalaSeq(attributes), SimpleAnalyzer$.MODULE$); --- Regards, Andy On Thu, Jan 5, 2017 at 2:53 AM, Liang-Chi Hsieh <vii...@gmail.com> wrote: > > You need to resolve and bind the encoder. > > ExpressionEncoder enconder = RowEn

Converting an InternalRow to a Row

2017-01-04 Thread Andy Dang
p = enconder.fromRow(internalRow); System.out.println("Round trip: " + roundTrip.size()); } The code fails at the line encoder.fromRow() with the exception: > Caused by: java.lang.UnsupportedOperationException: Cannot evaluate expression: getcolumnbyordinal(0, IntegerType) --- Regards, Andy

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
We remodel Spark dependencies and ours together and chuck them under the /jars path. There are other ways to do it but we want the classpath to be strictly as close to development as possible. --- Regards, Andy On Fri, Dec 23, 2016 at 6:00 PM, Chetan Khatri <chetan.opensou...@gmail.

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
-- Regards, Andy On Fri, Dec 23, 2016 at 6:44 AM, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Hello Spark Community, > > For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and > then submit to spark-submit. >

Negative number of active tasks

2016-12-23 Thread Andy Dang
special thing I'm doing is saving multiple datasets at the same time to HDFS from different threads. Thanks, Andy - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: cutting 1.6.2 rc and 2.0.0 rc this week?

2016-06-16 Thread andy petrella
ike >> they >> > can be retargeted are are just some doc updates. I'm going to be more >> > aggressive and pushing individual people about resolving those, in case >> this >> > drags on forever. >> > >> > >> > >> > >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> > -- andy

[Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread andy petrella
5.0.pom> Any idea? ps: this happens for streaming too at least -- andy

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread andy petrella
zhih...@gmail.com> wrote: > >> Andy: >> 1.5.1 has been released. >> >> Maybe you can use this: >> >> https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.1/spark-streaming_2.10-1.5.1.pom >> >> I can access the above. >&g

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread andy petrella
it's an option but not a solution, indeed Le ven. 2 oct. 2015 20:08, Ted Yu <yuzhih...@gmail.com> a écrit : > Andy: > 1.5.1 has been released. > > Maybe you can use this: > > https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.1/spark-streaming_

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread andy petrella
sure thing, I'll do it today I was just talking about the page thanks btw Le ven. 2 oct. 2015 20:26, Ted Yu <yuzhih...@gmail.com> a écrit : > Andy: > 1.5.1 has many critical bug fixes on top of 1.5.0 > > http://search-hadoop.com/m/q3RTtGrXP31BVt4l1 > > Please conside

Fwd: Parallel collection in driver programs

2015-09-22 Thread Andy Huang
Hi Devs, Hopefully one of you know more on this? Thanks Andy -- Forwarded message -- From: Andy Huang <andy.hu...@servian.com.au> Date: Wed, Sep 23, 2015 at 12:39 PM Subject: Parallel collection in driver programs To: u...@spark.apache.org Hi All, Would like know if

Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-09 Thread andy petrella
t;>> >>>> - >>>> -- Yu Ishikawa >>>> -- >>>> View this message in context: >>>> http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Spark-1-5-0-tp14013p14015.html >>>> Sent from the Apache Spark Developers List mailing list archive at >>>> Nabble.com. >>>> >>>> - >>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>> >>>> >>> >> > -- andy

Re: Nabble mailing list mirror errors: This post has NOT been accepted by the mailing list yet

2014-12-19 Thread Andy Konwinski
worked after all. Andy On Wed, Dec 17, 2014 at 1:09 PM, Josh Rosen rosenvi...@gmail.com wrote: Yeah, it looks like messages that are successfully posted via Nabble end up on the Apache mailing list, but messages posted directly to Apache aren't mirrored to Nabble anymore because it's based off

Re: Nabble mailing list mirror errors: This post has NOT been accepted by the mailing list yet

2014-12-18 Thread andy
I just changed the domain name in the mailing list archive settings to remove .incubator so maybe it'll work now. -- View this message in context:

Re: Nabble mailing list mirror errors: This post has NOT been accepted by the mailing list yet

2014-12-18 Thread andy
I just changed the domain name in the mailing list archive settings to remove .incubator so maybe it'll work now. Andy -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Nabble-mailing-list-mirror-errors-This-post-has-NOT-been-accepted-by-the-mailing

Re: Notes on writing complex spark applications

2014-11-23 Thread andy petrella
Cool! On Sun Nov 23 2014 at 5:58:03 PM Evan R. Sparks evan.spa...@gmail.com wrote: Hi all, Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been working on a short document about writing high performance Spark applications based on our experience developing MLlib, GraphX,

Re: Spark Streaming Metrics

2014-11-21 Thread andy petrella
the guy that I might poke him today with more materials. Btw, how're you? Tchuss man andy PS: did you tried the recent events thingy? On Fri Nov 21 2014 at 11:17:17 AM Gerard Maas gerard.m...@gmail.com wrote: Looks like metrics are not a hot topic to discuss - yet so important to sleep well

Re: Implementing TinkerPop on top of GraphX

2014-11-06 Thread andy petrella
Great stuffs! I've got some thoughts about that, and I was wondering if it would be first interesting to have something like for spark-core (let's say): 0/ Core API offering basic (or advanced → HeLP) primitives 1/ catalyst optimizer for a text base system (SPARQL, Cypher, custom SQL3, whatnot) 2/

Re: best IDE for scala + spark development?

2014-10-27 Thread andy petrella
I second the S[B]T combo! I tried ATOM → lack of features and stability (atm) aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] http://about.me/noootsab On Mon, Oct 27, 2014 at 2:15 PM, Dean Wampler deanwamp...@gmail.com wrote: For what it's worth, I use Sublime Text + the

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-08-01 Thread andy petrella
neither time nor timestamp field's value, but a kind-of internal index as range delimiter -- thus defining their own exotic continuum and break function. greetz, aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] http://about.me/noootsab On Thu, Jul 17, 2014 at 1:11 AM, andy

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-08-01 Thread andy petrella
☺) Cheers Andy

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-07-16 Thread andy petrella
tathagata.das1...@gmail.com wrote: Very interesting ideas Andy! Conceptually i think it makes sense. In fact, it is true that dealing with time series data, windowing over application time, windowing over number of events, are things that DStream does not natively support. The real challenge

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-07-16 Thread andy petrella
you ensure ordering with delayed records. If you want to process in order of application time, and records are delayed how do you deal with them. Any ideas? ;) TD On Wed, Jul 16, 2014 at 2:37 AM, andy petrella andy.petre...@gmail.com wrote: Heya TD, Thanks for the detailed answer

[brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-07-15 Thread andy petrella
Dear Sparkers, *[sorry for the lengthy email... = head to the gist https://gist.github.com/andypetrella/12228eb24eea6b3e1389 for a preview :-p**]* I would like to share some thinking I had due to a use case I faced. Basically, as the subject announced it, it's a generalization of the DStream

Fwd: 2014 Mesos community survey results

2014-06-24 Thread Andy Konwinski
I think it's cool that the Mesos team did a survey of usage and published the aggregate results. It would be cool to do a survey for the Spark project and publish the results on the Spark website like the Mesos team did. -- Forwarded message -- From: Dave Lester

Re: encounter jvm problem when integreation spark with mesos

2014-06-17 Thread andy petrella
Yep but no real resolution nor advances on this topic, since finally we've chosen to stick with a compatible version of Mesos (0.14.1 ftm). But I'm still convince it has to do with native libs clash :-s aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] http://about.me/noootsab

Re: [RESULT][VOTE] Release Apache Spark 1.0.0 (RC11)

2014-05-29 Thread Andy Konwinski
Saputra Sean McNamara* Xiangrui Meng* Andy Konwinski* Krishna Sankar Kevin Markey Patrick Wendell* Tathagata Das* 0: (1 vote) Ankur Dave* -1: (0 vote) Please hold off announcing Spark 1.0.0 until Apache Software Foundation makes the press release tomorrow

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-05-28 Thread Andy Konwinski
+1 On May 28, 2014 7:05 PM, Xiangrui Meng men...@gmail.com wrote: +1 Tested apps with standalone client mode and yarn cluster and client modes. Xiangrui On Wed, May 28, 2014 at 1:07 PM, Sean McNamara sean.mcnam...@webtrends.com wrote: Pulled down, compiled, and tested examples on OS X

Re: Scala examples for Spark do not work as written in documentation

2014-05-20 Thread Andy Konwinski
I fixed the bug, but I kept the parameter i instead of _ since that (1) keeps it more parallel to the python and java versions which also use functions with a named variable and (2) doesn't require readers to know this particular use of the _ syntax in Scala. Thanks for catching this Glenn. Andy

Re: can RDD be shared across mutil spark applications?

2014-05-17 Thread Andy Konwinski
RDDs cannot currently be shared across multiple SparkContexts without using something like the Tachyon project (which is a separate project/codebase). Andy On May 16, 2014 2:14 PM, qingyang li liqingyang1...@gmail.com wrote:

Re: branch-1.0 cut

2014-04-09 Thread Andy Konwinski
Wow, great work. Very impressive sticking to the schedule! On Wed, Apr 9, 2014 at 2:31 AM, Patrick Wendell pwend...@gmail.com wrote: Hey All, In accordance with the scheduled window for the release I've cut a 1.0 branch. Thanks a ton to everyone for being so active in reviews during the

Updating all references to github.com/apache/incubator-spark on spark website

2014-04-09 Thread Andy Konwinski
Since http://github.com/apache/incubator-spark and any links underneath it now return 404, I propose we do a global search and replace to change all instances to remove incubator-, including those in docs/0.8.0 docs/0.8.1 and docs/0.9.0. I'm happy to do this. Any discussion before I do? Andy

[DISCUSS] Shepherding PRs

2014-03-27 Thread Andy Konwinski
. Andy -- Forwarded message -- From: Benjamin Mahler benjamin.mah...@gmail.com Date: Mar 24, 2014 11:47 PM Subject: Re: Shepherding on ExternalContainerizer To: dev d...@mesos.apache.org Cc: Hey Till, We want to foster a healthy review culture, and so, as you observed, we thought we

Re: Announcing the official Spark Job Server repo

2014-03-24 Thread andy petrella
Thx for answering! see inline for my thoughts (or misunderstanding ? ^^) Andy, doesn't Marathon handle fault tolerance amongst its apps? ie if you say that N instances of an app are running, and one shuts off, then it spins up another one no? Yes indeed, but my wonder is about how to know how

Re: Making RDDs Covariant

2014-03-22 Thread andy petrella
of Container of Wagons (Rdds Dstreams themselves) to compose a Job using a (to be defined) DSLs. So without covariance I cannot for now define a generic noop Sink. My0.02c Andy Sent from Tab, sorry for the typos... Le 22 mars 2014 17:00, Pascal Voitot Dev pascal.voitot@gmail.com a écrit : On Sat

Re: Announcing the official Spark Job Server repo

2014-03-18 Thread andy petrella
the resources needed (à la Jenkins). Any idea is welcome. Back to the news, Evan + Ooyala team: Great Job again. andy On Tue, Mar 18, 2014 at 11:39 PM, Henry Saputra henry.sapu...@gmail.comwrote: W00t! Thanks for releasing this, Evan. - Henry On Tue, Mar 18, 2014 at 1:51 PM, Evan Chan e

Re: [re-cont] map and flatMap

2014-03-15 Thread andy petrella
Yep, Regarding flatMap and an implicit parameter might work like in scala's future for instance: https://github.com/scala/scala/blob/master/src/library/scala/concurrent/Future.scala#L246 Dunno, still waiting for some insights from the team ^^ andy On Wed, Mar 12, 2014 at 3:23 PM, Pascal Voitot

[re-cont] map and flatMap

2014-03-12 Thread andy petrella
(or whatever better name)? Or to have flatMap requiring a Monad instance of RDD? Sorry for the prose, just dropped my thoughts and feelings at once :-/ Cheers, andy PS: and my English maybe, although my name's Andy I'm a native Belgian ^^.