date:20170722

Re: Querying Drill with Spark DataFrame

2017-07-22 Thread Luqman Ghani

BTW, do we have to register JdbcDialect for every Spark/SQL context, or once for a Spark server? On Sun, Jul 23, 2017 at 2:26 AM, Luqman Ghani wrote: > I have found the solution for this error. I have to register a JdbcDialect > for Drill as mentioned in the following post on

Re: Querying Drill with Spark DataFrame

2017-07-22 Thread Luqman Ghani

I have found the solution for this error. I have to register a JdbcDialect for Drill as mentioned in the following post on SO: https://stackoverflow.com/questions/35476076/integrating-spark-sql-and-apache-drill-through-jdbc Thanks On Sun, Jul 23, 2017 at 2:10 AM, Luqman Ghani

Re: Querying Drill with Spark DataFrame

2017-07-22 Thread Luqman Ghani

I have done that, but Spark is encompassing my query with same clause: SELECT "CustomerID", etc FROM ( my query from table) so same error. On Sun, Jul 23, 2017 at 2:02 AM, ayan guha wrote: > You can formulate a query in dbtable clause in jdbc reader. > > On Sun, 23 Jul 2017

Re: [Spark] Working with JavaPairRDD from Scala

2017-07-22 Thread Lukasz Tracewski

Hi - and my thanks to you and Gerard. Only late hour in the night can explain how I could possibly miss this. Cheers! Lukasz On 22/07/2017 10:48, yohann jardin wrote: Hello Lukasz, You can just: val pairRdd = javapairrdd.rdd(); Then pairRdd will be of type RDD>, with K

Re: Querying Drill with Spark DataFrame

2017-07-22 Thread ayan guha

You can formulate a query in dbtable clause in jdbc reader. On Sun, 23 Jul 2017 at 6:43 am, Luqman Ghani wrote: > Hi, > > I'm working on integrating Apache Drill with Apache Spark with Drill's > JDBC driver. I'm trying a simple select * from table from Drill through >

Querying Drill with Spark DataFrame

2017-07-22 Thread Luqman Ghani

Hi, I'm working on integrating Apache Drill with Apache Spark with Drill's JDBC driver. I'm trying a simple select * from table from Drill through spark.sqlContext.load via jdbc driver. I'm running the following code in Spark Shell: > ./bin/spark-shell --driver-class-path

Re: custom joins on dataframe

2017-07-22 Thread Sumedh Wale

The Dataset.join(right: Dataset[_], joinExprs: Column) API can use any arbitrary expression so you can use UDF for join. The problem with all non-equality joins is that they use BroadcastNestedLoopJoin or equivalent, that is an (M X N) nested-loop which will be unusable for medium/large

Informing Spark about specific Partitioning scheme to avoid shuffles

2017-07-22 Thread saatvikshah1994

Hi everyone, My environment is PySpark with Spark 2.0.0. I'm using spark to load data from a large number of files into a Spark dataframe with fields say field1 to field10. While loading my data I have ensured that records are partitioned by field1 and field2(without using partitionBy). This

custom joins on dataframe

2017-07-22 Thread Stephen Fletcher

Normally a family of joins (left, right outter, inner) are performed on two dataframes using columns for the comparison ie left("acol") === ight("acol") . the comparison operator of the "left" dataframe does something internally and produces a column that i assume is used by the join. What I want

Re: Is there a way to run Spark SQL through REST?

2017-07-22 Thread Sumedh Wale

On Saturday 22 July 2017 01:31 PM, kant kodali wrote: Is there a way to run Spark SQL through REST? There is spark-jobserver (https://github.com/spark-jobserver/spark-jobserver). It does more than just REST API (like long running SparkContext). regards -- Sumedh Wale SnappyData

Re: Is there a way to run Spark SQL through REST?

2017-07-22 Thread Jean Georges Perrin

There's Livi but it's pretty resource intensive. I know it's not helpful but my company has developed its own and I am trying to Open Source it. Looks like there are quite a few companies who had the need and custom build. jg > On Jul 22, 2017, at 04:01, kant kodali

RE: [Spark] Working with JavaPairRDD from Scala

2017-07-22 Thread yohann jardin

Hello Lukasz, You can just: val pairRdd = javapairrdd.rdd(); Then pairRdd will be of type RDD>, with K being com.vividsolutions.jts.geom.Polygon, and V being java.util.HashSet[com.vividsolutions.jts.geom.Polygon] If you really want to continue with Java objects: val