Re: Spark-sql versus Impala versus Hive

2015-06-18 Thread Steve Nunez
Interesting. What where the Hive settings? Specifically it would be useful to know if this was Hive on Tez. - Steve From: Sanjay Subramanian Reply-To: Sanjay Subramanian Date: Thursday, June 18, 2015 at 11:08 To: "user@spark.apache.org" Subject: Spark-sql versus Imp

Pairwise Processing of a List

2015-01-25 Thread Steve Nunez
Spark Experts, I've got a list of points: List[(Float, Float)]) that represent (x,y) coordinate pairs and need to sum the distance. It's easy enough to compute the distance: case class Point(x: Float, y: Float) { def distance(other: Point): Float = sqrt(pow(x - other.x, 2) + pow(y - other

Re: Pairwise Processing of a List

2015-01-25 Thread Steve Nunez
...@mc10inc.com>> Date: Sunday, January 25, 2015 at 17:17 To: Steve Nunez mailto:snu...@hortonworks.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: Pairwise Processing of a List So you've got a point A and you w

Re: Cluster submit mode - only supported on Yarn?

2014-07-23 Thread Steve Nunez
I¹m also in early stages of setting up long running Spark jobs. Easiest way I¹ve found is to set up a cluster and submit the job via YARN. Then I can come back and check in on progress when I need to. Seems the trick is tuning the queue priority and YARN preemption to get the job to run in a reason

Emacs Setup Anyone?

2014-07-24 Thread Steve Nunez
Anyone out there have a good configuration for emacs? Scala-mode sort of works, but I¹d love to see a fully-supported spark-mode with an inferior shell. Searching didn¹t turn up much of anything. Any emacs users out there? What setup are you using? Cheers, - SteveN -- CONFIDENTIALITY NOTICE

Re: How to specify the job to run on the specific nodes(machines) in the hadoop yarn cluster?

2014-07-30 Thread Steve Nunez
This is a common request. Unfortunately, AFAIK, you can¹t do it yet. Once labels (YARN-796 ) are out we should see this capability and be able to OEpin¹ jobs to labels. If anyone figures out a way to do this in the meantime, I¹d love to hear about it

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Steve Nunez
Provided you¹ve got the HWX repo in your pom.xml, you can build with this line: mvn -Pyarn -Phive -Phadoop-2.4 -Dhadoop.version=2.4.0.2.1.1.0-385 -DskipTests clean package I haven¹t tried building a distro, but it should be similar. - SteveN On 8/4/14, 1:25, "Sean Owen" wrote: >For a

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Steve Nunez
I don’t think there is an hwx profile, but there probably should be. - Steve From: Patrick Wendell Date: Monday, August 4, 2014 at 10:08 To: Ron's Yahoo! Cc: Ron's Yahoo! , Steve Nunez , , "d...@spark.apache.org" Subject: Re: Issues with HDP 2.4.0.2.1.3.0-563 Ah I

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Steve Nunez
x27;s the vendor's problem.) > >This isn't any argument about being purist but just that I am not sure >these are things that the project can meaningfully bother with. > >It makes sense to set vendor repos in the pom for convenience, and >makes sense to run smoke tests

MovieLensALS - Scala Pattern Magic

2014-08-04 Thread Steve Nunez
atings.map(_._2.product).distinct.count Cheers, - Steve Nunez -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable

Reference Accounts & Large Node Deployments

2014-08-27 Thread Steve Nunez
All, Does anyone have specific references to customers, use cases and large-scale deployments of Spark Streaming? By OElarge scale¹ I mean both through-put and number of nodes. I¹m attempting an objective comparison of Streaming and Storm and while this data is known for Storm, there appears to be

FW: Reference Accounts & Large Node Deployments

2014-08-28 Thread Steve Nunez
Anyone? No customers using streaming at scale? From: Steve Nunez Date: Wednesday, August 27, 2014 at 9:08 To: "user@spark.apache.org" Subject: Reference Accounts & Large Node Deployments > All, > > Does anyone have specific references to customers, use cases and la

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Steve Nunez
Great stuff. Wonderful to see such progress in so short a time. How about some links to code and instructions so that these benchmarks can be reproduced? Regards, - Steve From: Debasish Das Date: Friday, October 10, 2014 at 8:17 To: Matei Zaharia Cc: user , dev Subject: Re: Breaking the

Directory / File Reading Patterns

2015-01-17 Thread Steve Nunez
Hello Users, I've got a real-world use case that seems common enough that its pattern would be documented somewhere, but I can't find any references to a simple solution. The challenge is that data is getting dumped into a directory structure, and that directory structure itself contains featur