date:20190416

Re: Sort order in bucketing in a custom datasource

2019-04-16 Thread Jacek Laskowski

Hi, I don't think so. I can't think of an interface (trait) that would give that information to the Catalyst optimizer. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming

Re: Spark 2.4.2

2019-04-16 Thread Michael Armbrust

Thanks Ryan. To me the "test" for putting things in a maintenance release is really a trade-off between benefit and risk (along with some caveats, like user facing surface should not grow). The benefits here are fairly large (now it is possible to plug in partition aware data sources) and the risk

Re: Spark 2.4.2

2019-04-16 Thread Ryan Blue

Spark has a lot of strange behaviors already that we don't fix in patch releases. And bugs aren't usually fixed with a configuration flag to turn on the fix. That said, I don't have a problem with this commit making it into a patch release. This is a small change and looks safe enough to me. I

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-16 Thread Bobby Evans

I am +1, I better be because I am proposing the SPIP. Thanks, Bobby On Tue, Apr 16, 2019 at 10:38 AM Tom Graves wrote: > Hi everyone, > > I'd like to call for a vote on SPARK-27396 - SPIP: Public APIs for > extended Columnar Processing Support. The proposal is to extend the > support to

Re: Spark 2.4.2

2019-04-16 Thread Michael Armbrust

I would argue that its confusing enough to a user for options from DataFrameWriter to be silently dropped when instantiating the data source to consider this a bug. They asked for partitioning to occur, and we are doing nothing (not even telling them we can't). I was certainly surprised by this

Re: Spark 2.4.2

2019-04-16 Thread Ryan Blue

Is this a bug fix? It looks like a new feature to me. On Tue, Apr 16, 2019 at 4:13 PM Michael Armbrust wrote: > Hello All, > > I know we just released Spark 2.4.1, but in light of fixing SPARK-27453 > I was wondering if it > might make sense

Spark 2.4.2

2019-04-16 Thread Michael Armbrust

Hello All, I know we just released Spark 2.4.1, but in light of fixing SPARK-27453 I was wondering if it might make sense to follow up quickly with 2.4.2. Without this fix its very hard to build a datasource that correctly handles partitioning

Re: Sort order in bucketing in a custom datasource

2019-04-16 Thread Russell Spitzer

Please join the DataSource V2 meetings, the next one is tomorrow since we are discussing these very topics right now. Datasource v1 cannot provide this information but any source which just generates RDDs can specify a partitioner. This is only useful though if you are only using RDDs, for

Sort order in bucketing in a custom datasource

2019-04-16 Thread Long, Andrew

Hey Friends, Is it possible to specify the sort order or bucketing in a way that can be used by the optimizer in spark? Cheers Andrew

[VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-16 Thread Tom Graves

Hi everyone, I'd like to call for a vote on SPARK-27396 - SPIP: Public APIs for extended Columnar Processing Support. The proposal is to extend the support to allow for more columnar processing. You can find the full proposal in the jira at: https://issues.apache.org/jira/browse/SPARK-27396.

Re: Is there value in publishing nightly snapshots?

2019-04-16 Thread Koert Kuipers

we have used it at times to detect any breaking changes, since it allows us to run out internal unit tests against spark snapshot binaries but we can also build these snapshots in-house if you want to turn it off On Tue, Apr 16, 2019 at 9:29 AM Sean Owen wrote: > I noticed recently ... > >

Is there value in publishing nightly snapshots?

2019-04-16 Thread Sean Owen

I noticed recently ... https://github.com/apache/spark-website/pull/194/files#diff-d95d573366135f01d4fbae2d64522500R466 ... that we stopped publishing nightly releases a long while ago. That's fine. What about turning off the job that builds -SNAPSHOTs of the artifacts each night? does anyone

Re: pyspark.sql.functions ide friendly

2019-04-16 Thread 880f0464

Hi. That's a problem with Spark as such and in general can be addressed on IDE to IDE basis - see for example https://stackoverflow.com/q/40163106 for some hints. Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Tuesday, April 16, 2019 2:10 PM, educhana wrote: > Hi, >

pyspark.sql.functions ide friendly

2019-04-16 Thread educhana

Hi, Currently using pyspark.sql.functions from an IDE like PyCharm is causing the linters complain due to the functions being declared at runtime. Would a PR fixing this be welcomed? Is there any problems/difficulties I'm unaware? -- Sent from:

Support for arrays parquet vectorized reader

2019-04-16 Thread Mick Davies

Hi, I'm working with a medical data model that uses arrays of simple types to represent things like the drug exposures and conditions that are associated with a patient. Using this model, patient data is co-located and is consequently processed by Spark more efficiently. The data is stored in

Fwd: Uncaught Exception Handler in master

2019-04-16 Thread Alessandro Liparoti

Hi everyone, I have a spark libary where I would like to do some action before an uncaught exception happens (log it, increment an error metric, ...). I tried multiple times to use setUncaughtExceptionHandler in the current Thread but this doesn't work. If I spawn another thread this works fine.

Re: Sort order in bucketing in a custom datasource

Re: Spark 2.4.2

Re: Spark 2.4.2

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

Re: Spark 2.4.2

Re: Spark 2.4.2

Spark 2.4.2

Re: Sort order in bucketing in a custom datasource

Sort order in bucketing in a custom datasource

[VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

Re: Is there value in publishing nightly snapshots?

Is there value in publishing nightly snapshots?

Re: pyspark.sql.functions ide friendly

pyspark.sql.functions ide friendly

Support for arrays parquet vectorized reader

Fwd: Uncaught Exception Handler in master

16 matches

Site Navigation

Mail list logo

Footer information