Re: YARN Shuffle service and its compatibility

2016-04-19 Thread Mark Grover
Great, thanks for confirming, Reynold. Appreciate it! On Tue, Apr 19, 2016 at 4:20 PM, Reynold Xin wrote: > I talked to Lianhui offline and he said it is not that big of a deal to > revert the patch. > > > On Tue, Apr 19, 2016 at 9:52 AM, Mark Grover

Re: YARN Shuffle service and its compatibility

2016-04-19 Thread Reynold Xin
I talked to Lianhui offline and he said it is not that big of a deal to revert the patch. On Tue, Apr 19, 2016 at 9:52 AM, Mark Grover wrote: > Thanks. > > I'm more than happy to wait for more people to chime in here but I do feel > that most of us are leaning towards Option B

Re: Possible deadlock in registering applications in the recovery mode

2016-04-19 Thread Niranda Perera
Hi Reynold, I have created a JIRA for this [1]. I have also created a PR for the same issue [2]. Would be very grateful if you could look into this, because this is a blocker in our spark deployment, which uses number of spark custom extension. thanks best [1]

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
Clarification: in my previous email, I was not talking about spark-streaming-flume artifact or spark-streaming-kafka artifact. I was talking about examples for these projects, such as examples//src/main/python/streaming/flume_wordcount.py On Tue, Apr 19, 2016 at 11:10 AM, Marcelo Vanzin

Re: Organizing Spark ML example packages

2016-04-19 Thread Bryan Cutler
+1, adding some organization would make it easier for people to find a specific example On Mon, Apr 18, 2016 at 11:52 PM, Yanbo Liang wrote: > This sounds good to me, and it will make ML examples more neatly. > > 2016-04-14 5:28 GMT-07:00 Nick Pentreath

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
On Tue, Apr 19, 2016 at 11:07 AM, Ted Yu wrote: > The same question can be asked w.r.t. examples for other projects, such as > flume > and kafka. > The main difference being that flume and kafka integration are part of Spark itself. HBase integration is not. > On Tue,

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
The same question can be asked w.r.t. examples for other projects, such as flume and kafka. On Tue, Apr 19, 2016 at 11:01 AM, Marcin Tustin wrote: > Let's posit that the spark example is much better than what is available > in HBase. Why is that a reason to keep it within

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcin Tustin
Let's posit that the spark example is much better than what is available in HBase. Why is that a reason to keep it within Spark? On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu wrote: > bq. HBase's current support, even if there are bugs or things that still > need to be done, is

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
bq. HBase's current support, even if there are bugs or things that still need to be done, is much better than the Spark example In my opinion, a simple example that works is better than a buggy package. I hope before long the hbase-spark module in HBase can arrive at a state which we can

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
You're completely missing my point. I'm saying that HBase's current support, even if there are bugs or things that still need to be done, is much better than the Spark example, which is basically a call to "SparkContext.hadoopRDD". Spark's example is not helpful in learning how to build an HBase

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
'bq.' is used in JIRA to quote what other people have said. On Tue, Apr 19, 2016 at 10:42 AM, Reynold Xin wrote: > Ted - what's the "bq" thing? Are you using some 3rd party (e.g. Atlassian) > syntax? They are not being rendered in email. > > > On Tue, Apr 19, 2016 at 10:41

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
You're entitled to your own opinions. While you're at it, here's some much better documentation, from the HBase project themselves, than what the Spark example provides: http://hbase.apache.org/book.html#spark On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu wrote: > bq. it's

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Reynold Xin
Ted - what's the "bq" thing? Are you using some 3rd party (e.g. Atlassian) syntax? They are not being rendered in email. On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu wrote: > bq. it's actually in use right now in spite of not being in any upstream > HBase release > > If it is

Re: RFC: Remote "HBaseTest" from examples?

2016-04-19 Thread Josh Rosen
+1; I think that it's preferable for code examples, especially third-party integration examples, to live outside of Spark. On Tue, Apr 19, 2016 at 10:29 AM Reynold Xin wrote: > Yea in general I feel examples that bring in a large amount of > dependencies should be outside

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
bq. it's actually in use right now in spite of not being in any upstream HBase release If it is not in upstream, then it is not relevant for discussion on Apache mailing list. On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin wrote: > Alright, if you prefer, I'll say "it's

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
Alright, if you prefer, I'll say "it's actually in use right now in spite of not being in any upstream HBase release", and it's more useful than a single example file in the Spark repo for those who really want to integrate with HBase. Spark's example is really very trivial (just uses one of

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
bq. create a separate tarball for them Probably another thread can be started for the above. I am fine with it. On Tue, Apr 19, 2016 at 10:34 AM, Marcelo Vanzin wrote: > On Tue, Apr 19, 2016 at 10:28 AM, Reynold Xin wrote: > > Yea in general I feel

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Marcelo Vanzin
On Tue, Apr 19, 2016 at 10:28 AM, Reynold Xin wrote: > Yea in general I feel examples that bring in a large amount of dependencies > should be outside Spark. Another option to avoid the dependency problem is to not ship examples in the distribution, and maybe create a

Re: RFC: Remove "HBaseTest" from examples?

2016-04-19 Thread Ted Yu
bq. I wouldn't call it "incomplete". I would call it incomplete. Please see HBASE-15333 'Enhance the filter to handle short, integer, long, float and double' which is a bug fix. Please exclude presence of related of module in vendor distro from this discussion. Thanks On Tue, Apr 19, 2016 at

Re: YARN Shuffle service and its compatibility

2016-04-19 Thread Tom Graves
It would be nice if we could keep this compatible between 1.6 and 2.0 so I'm more for Option B at this point since the change made seems minor and we can change to have shuffle service do internally like Marcelo mention. Then lets try to keep compatible, but if there is a forcing function lets

Question about storage memory in unified memory manager

2016-04-19 Thread Patrick Woody
Hey all, I had a question about the MemoryStore for the BlockManager with the unified memory manager v.s. the legacy mode. In the unified format, I would expect the max size of the MemoryStore to be * * in the same way that when using the StaticMemoryManager it is * * . Instead it

Introduction to Spark workshop, May 9, New York

2016-04-19 Thread Rich Bowen
Hi, folks, I received the following request: --- The guy who was going to teach the Introduction to Spark workshop at Data Summit on May 9th has changed jobs and can no longer do the workshop. Know anybody in the New York area who could fill in? It's scheduled from 9 to 12 at the New

Re: [spark.ml] Why is private class ColumnPruner?

2016-04-19 Thread Jacek Laskowski
Hi Yanbo, https://issues.apache.org/jira/browse/SPARK-14730 Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Apr 19, 2016 at 8:55 AM, Yanbo Liang

Re: more uniform exception handling?

2016-04-19 Thread Steve Loughran
On 18 Apr 2016, at 20:16, Reynold Xin > wrote: Josh's pull request on rpc exception handling got me to think ... In my experience, there have been a few things related exceptions that created a lot of

Re: YARN Shuffle service and its compatibility

2016-04-19 Thread Steve Loughran
> On 18 Apr 2016, at 23:05, Marcelo Vanzin wrote: > > On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin wrote: >> The bigger problem is that it is much easier to maintain backward >> compatibility rather than dictating forward compatibility. For example, as

Re: more uniform exception handling?

2016-04-19 Thread Sean Owen
We already have SparkException, indeed. The ID is an interesting idea; simple to implement and might help disambiguate. Does it solve a lot of problems of this form? if something is squelching Exception or SparkException the result will be the same. #2 is something we can sniff out with static

Re: [spark.ml] Why is private class ColumnPruner?

2016-04-19 Thread Yanbo Liang
Hi Jacek, This is due to ColumnPruner is only used for RFormula currently, we did not expose it as a feature transformer. Please feel free to create JIRA and work on it. Thanks Yanbo 2016-03-25 8:50 GMT-07:00 Jacek Laskowski : > Hi, > > Came across `private class ColumnPruner`

Re: Organizing Spark ML example packages

2016-04-19 Thread Yanbo Liang
This sounds good to me, and it will make ML examples more neatly. 2016-04-14 5:28 GMT-07:00 Nick Pentreath : > Hey Spark devs > > I noticed that we now have a large number of examples for ML & MLlib in > the examples project - 57 for ML and 67 for MLLIB to be precise.