from:"emlyn"

Re: CBO not working for Parquet Files

2018-09-06 Thread emlyn

e missing something, as it seems that partitioned parquet files would be a common use case, and if this is a bug in Spark I would have expected it to have been picked up sooner. Has anybody managed to get cbo working with partitioned parquet files? Is this a known issue? Thanks, Emlyn -- Sent f

Re: Concurrent Spark jobs

2016-03-31 Thread emlyn

In case anyone else has the same problem and finds this - in my case it was fixed by increasing spark.sql.broadcastTimeout (I used 9000). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Concurrent-Spark-jobs-tp26011p26648.html Sent from the Apache Spark

Re: Concurrent Spark jobs

2016-01-25 Thread emlyn

Jean wrote > Have you considered using pools? > http://spark.apache.org/docs/latest/job-scheduling.html#fair-scheduler-pools > > I haven't tried that by myself, but it looks like pool setting is applied > per thread so that means it's possible to configure fair scheduler, so > that more, than one

Re: Concurrent Spark jobs

2016-01-21 Thread emlyn

Thanks for the responses (not sure why they aren't showing up on the list). Michael wrote: > The JDBC wrapper for Redshift should allow you to follow these > instructions. Let me know if you run into any more issues. >

Spark 1.6 ignoreNulls in first/last aggregate functions

2016-01-21 Thread emlyn

As I understand it, Spark 1.6 changes the behaviour of the first and last aggregate functions to take nulls into account (where they were ignored in 1.5). From SQL you can use "IGNORE NULLS" to get the old behaviour back. How do I ignore nulls

Re: Spark 1.6 ignoreNulls in first/last aggregate functions

2016-01-21 Thread emlyn

Turns out I can't use a user defined aggregate function, as they are not supported in Window operations. There surely must be some way to do a last_value with ignoreNulls enabled in Spark 1.6? Any ideas for workarounds? -- View this message in context:

Concurrent Spark jobs

2016-01-19 Thread emlyn

We have a Spark application that runs a number of ETL jobs, writing the outputs to Redshift (using databricks/spark-redshift). This is triggered by calling DataFrame.write.save on the different DataFrames one after another. I noticed that during the Redshift load while the output of one job is

Merging compatible schemas on Spark 1.6.0

2016-01-13 Thread emlyn

I have a series of directories on S3 with parquet data, all with compatible (but not identical) schemas. We verify that the schemas stay compatible when they evolve using org.apache.avro.SchemaCompatibility.checkReaderWriterCompatibility. On Spark 1.5, I could read these into a DataFrame with

Re: Cannot start REPL shell since 1.4.0

2015-10-23 Thread Emlyn Corrin

wrote: > do you have JAVA_HOME set to a java 7 jdk? > > 2015-10-23 7:12 GMT-04:00 emlyn <em...@swiftkey.com>: > >> xjlin0 wrote >> > I cannot enter REPL shell in 1.4.0/1.4.1/1.5.0/1.5.1(with pre-built with >> > or without Hadoop or home compiled with an

Re: Cannot start REPL shell since 1.4.0

2015-10-23 Thread emlyn

xjlin0 wrote > I cannot enter REPL shell in 1.4.0/1.4.1/1.5.0/1.5.1(with pre-built with > or without Hadoop or home compiled with ant or maven). There was no error > message in v1.4.x, system prompt nothing. On v1.5.x, once I enter > $SPARK_HOME/bin/pyspark or spark-shell, I got > > Error:

Re: Cannot start REPL shell since 1.4.0

2015-10-23 Thread emlyn

emlyn wrote > > xjlin0 wrote >> I cannot enter REPL shell in 1.4.0/1.4.1/1.5.0/1.5.1(with pre-built with >> or without Hadoop or home compiled with ant or maven). There was no >> error message in v1.4.x, system prompt nothing. On v1.5.x, once I enter >> $SPARK_HOME

Re: CBO not working for Parquet Files

Re: Concurrent Spark jobs

Re: Concurrent Spark jobs

Re: Concurrent Spark jobs

Spark 1.6 ignoreNulls in first/last aggregate functions

Re: Spark 1.6 ignoreNulls in first/last aggregate functions

Concurrent Spark jobs

Merging compatible schemas on Spark 1.6.0

Re: Cannot start REPL shell since 1.4.0

Re: Cannot start REPL shell since 1.4.0

Re: Cannot start REPL shell since 1.4.0

11 matches

Site Navigation

Mail list logo

Footer information