New to spark 2.2.1 - Problem with finding tables between different metastore db

2018-02-06 Thread Subhajit Purkayastha
All, I am new to Spark 2.2.1. I have a single node cluster and also have enabled thriftserver for my Tableau application to connect to my persisted table. I feel that the spark cluster metastore is different from the thrift-server metastore. If this assumption is valid, what do I need to

Spark DataFrame Join _ performance issues

2016-09-19 Thread Subhajit Purkayastha
I am running my spark (1.5.2) instance in a virtualbox VM. I have 10gb memory allocated to it. I have a fact table extract, with 1 rows var glbalance_df_select = ("LEDGER_ID","CODE_COMBINATION_ID","CURRENCY_CODE", "PERIOD_TYPE","TEMPLATE_ID",

Spark Save mode "Overwrite" -Lock wait timeout exceeded; try restarting transaction Error

2016-09-11 Thread Subhajit Purkayastha
I am using spark 1.5.2 with Memsql Database as a persistent repository I am trying to update rows (based on the primary key), if it is appears more than 1 time (basically run the save load as a Upsert operation) val UpSertConf = SaveToMemSQLConf(msc.memSQLConf,

RE: Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Subhajit Purkayastha
I want Thanks for your help From: Mike Metzger [] Sent: Friday, August 26, 2016 2:12 PM To: Subhajit Purkayastha <> Cc: user @spark <> Subject: Re: Spark 2.0 - Insert/Update to a DataFrame Without se

RE: Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Subhajit Purkayastha
:13 PM To: Subhajit Purkayastha <> Cc: user @spark <> Subject: Re: Spark 2.0 - Insert/Update to a DataFrame Without seeing the makeup of the Dataframes nor what your logic is for updating them, I'd suggest doing a join of the

Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Subhajit Purkayastha
I am using spark 2.0, have 2 DataFrames, SalesOrder and Forecast. I need to update the Forecast Dataframe record(s), based on the SaleOrder DF record. What is the best way to achieve this functionality

DataFrame Data Manipulation - Based on a timestamp column Not Working

2016-08-23 Thread Subhajit Purkayastha
Using spark 2.0 & scala 2.11.8, I have a DataFrame with a timestamp column root |-- ORG_ID: integer (nullable = true) |-- HEADER_ID: integer (nullable = true) |-- ORDER_NUMBER: integer (nullable = true) |-- LINE_ID: integer (nullable = true) |-- LINE_NUMBER: integer (nullable = true)

Spark 2.0 - Join statement compile error

2016-08-22 Thread Subhajit Purkayastha
All, I have the following dataFrames and the temp table. I am trying to create a new DF , the following statement is not compiling val df = sales_demand.join(product_master,(sales_demand.INVENTORY_ITEM_ID==product_ma ster.INVENTORY_ITEM_ID),joinType="inner") What am I

Getting error, when I do

2016-08-01 Thread Subhajit Purkayastha
I am getting this error in the spark-shell when I do . Which jar file I need to download to fix this error? Error scala> val df = msc.sql(query) df: org.apache.spark.sql.DataFrame = [id: int, name: string] scala> java.lang.NoClassDefFoundError:

Configure Spark to run with MemSQL DB Cluster

2016-07-26 Thread Subhajit Purkayastha
All, Is it possible to integrate spark 1.6.1 with MemSQL Cluster? Any pointers on how to start with the project will be appreciated. Thx, Subhajit

Saprk 1.5 - How to join 3 RDDs in a SQL DF?

2015-10-11 Thread Subhajit Purkayastha
Can I join 3 different RDDs together in a Spark SQL DF? I can find examples for 2 RDDs but not 3. Thanks

Error - Calling a package (com.databricks:spark-csv_2.10:1.0.3) with spark-submit

2015-09-11 Thread Subhajit Purkayastha
I am on spark 1.3.1 When I do the following with spark-shell, it works spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 Then I can create a DF using the spark-csv package import sqlContext.implicits._ import org.apache.spark.sql._ // Return the dataset specified by