Join pushdown on two external tables from the same external source?

2017-06-13 Thread drewrobb
I'm trying to figure out how to multiple tables from a single external source directly in spark sql. Say I do the following in spark SQL: CREATE OR REPLACE TEMPORARY VIEW t1 USING jdbc OPTIONS ( dbtable 't1' ...) CREATE OR REPLACE TEMPORARY VIEW t2 USING jdbc OPTIONS ( dbtable 't2' ...) SELECT *

_SUCCESS file validation on read

2017-04-03 Thread drewrobb
When writing a dataframe, a _SUCCESS file is created to mark that the entire dataframe is written. However, the existence of this _SUCCESS does not seem to be validated by default on reads. This would allow in some cases for partially written dataframes to be read back. Is this behavior

Re: No way to set mesos cluster driver memory overhead?

2016-10-13 Thread drewrobb
It seems like this is a real issue, so I've opened an issue: https://issues.apache.org/jira/browse/SPARK-17928 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/No-way-to-set-mesos-cluster-driver-memory-overhead-tp27897p27901.html Sent from the Apache Spark

No way to set mesos cluster driver memory overhead?

2016-10-13 Thread drewrobb
When using spark on mesos and deploying a job in cluster mode using dispatcher, there appears to be no memory overhead configuration for the launched driver processes ("--driver-memory" is the same as Xmx which is the same as the memory quota). This makes it almost a guarantee that a long running