Re: Using existing distribution for join when subset of keys

2020-05-31 Thread Patrick Woody
You can use bucketBy to avoid shuffling in your scenario. This test suite > has some examples: > https://github.com/apache/spark/blob/45cf5e99503b00a6bd83ea94d6d92761db1a00ab/sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala#L343 > > Thanks, > Terry > > On S

Using existing distribution for join when subset of keys

2020-05-31 Thread Patrick Woody
Hey all, I have one large table, A, and two medium sized tables, B & C, that I'm trying to complete a join on efficiently. The result is multiplicative on A join B, so I'd like to avoid shuffling that result. For this example, let's just assume each table has three columns, x, y, z. The below is

Spark 1.6.2 short circuit AND filter broken

2016-07-07 Thread Patrick Woody
Hey all, I hit a pretty nasty bug on 1.6.2 that I can't reproduce on 2.0. Here is the code/logical plan http://pastebin.com/ULnHd1b6. I have filterPushdown disabled, so when I call collect here it hits the Exception in my UDF before doing a null check on the input. I believe it is a symptom of

Get Spark version before starting context

2015-07-04 Thread Patrick Woody
Hey all, Is it possible to reliably get the version string of a Spark cluster prior to trying to connect via the SparkContext on the client side? Most of the errors I've seen on mismatched versions have been cryptic, so it would be helpful if I could throw an exception earlier. I know it is

Re: Get Spark version before starting context

2015-07-04 Thread Patrick Woody
To somewhat answer my own question - it looks like an empty request to the rest API will throw an error which returns the version in JSON as well. Still not ideal though. Would there be any objection to adding a simple version endpoint to the API? On Sat, Jul 4, 2015 at 4:00 PM, Patrick Woody

Re: Dynamic allocator requests -1 executors

2015-06-13 Thread Patrick Woody
allocation in 1.4 that permitted requesting negative numbers of executors. Any chance you'd be able to try with the newer version and see if the problem persists? -Sandy On Fri, Jun 12, 2015 at 7:42 PM, Patrick Woody patrick.woo...@gmail.com wrote: Hey all, I've recently run

Dynamic allocator requests -1 executors

2015-06-12 Thread Patrick Woody
Hey all, I've recently run into an issue where spark dynamicAllocation has asked for -1 executors from YARN. Unfortunately, this raises an exception that kills the executor-allocation thread and the application can't request more resources. Has anyone seen this before? It is spurious and the

Re: SparkSQL UDTs with Ordering

2015-03-24 Thread Patrick Woody
have to be on the internal form, not the user visible form. On Tue, Mar 24, 2015 at 12:25 PM, Patrick Woody patrick.woo...@gmail.com wrote: Hey all, Currently looking into UDTs and I was wondering if it is reasonable to add the ability to define an Ordering (or if this is possible, then how

SparkSQL UDTs with Ordering

2015-03-24 Thread Patrick Woody
Hey all, Currently looking into UDTs and I was wondering if it is reasonable to add the ability to define an Ordering (or if this is possible, then how)? Currently it will throw an error when non-Native types are used. Thanks! -Pat