Saprk 1.5 - How to join 3 RDDs in a SQL DF?

2015-10-11 Thread Subhajit Purkayastha
Can I join 3 different RDDs together in a Spark SQL DF? I can find examples for 2 RDDs but not 3. Thanks

Error - Calling a package (com.databricks:spark-csv_2.10:1.0.3) with spark-submit

2015-09-11 Thread Subhajit Purkayastha
I am on spark 1.3.1 When I do the following with spark-shell, it works spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 Then I can create a DF using the spark-csv package import sqlContext.implicits._ import org.apache.spark.sql._ // Return the dataset specified by

Configure Spark to run with MemSQL DB Cluster

2016-07-26 Thread Subhajit Purkayastha
All, Is it possible to integrate spark 1.6.1 with MemSQL Cluster? Any pointers on how to start with the project will be appreciated. Thx, Subhajit

Getting error, when I do df.show()

2016-08-01 Thread Subhajit Purkayastha
I am getting this error in the spark-shell when I do . Which jar file I need to download to fix this error? Df.show() Error scala> val df = msc.sql(query) df: org.apache.spark.sql.DataFrame = [id: int, name: string] scala> df.show() java.lang.NoClassDefFoundError:

Spark Save mode "Overwrite" -Lock wait timeout exceeded; try restarting transaction Error

2016-09-11 Thread Subhajit Purkayastha
I am using spark 1.5.2 with Memsql Database as a persistent repository I am trying to update rows (based on the primary key), if it is appears more than 1 time (basically run the save load as a Upsert operation) val UpSertConf = SaveToMemSQLConf(msc.memSQLConf,

Spark DataFrame Join _ performance issues

2016-09-19 Thread Subhajit Purkayastha
I am running my spark (1.5.2) instance in a virtualbox VM. I have 10gb memory allocated to it. I have a fact table extract, with 1 rows var glbalance_df_select = glbalance_df.select ("LEDGER_ID","CODE_COMBINATION_ID","CURRENCY_CODE", "PERIOD_TYPE","TEMPLATE_ID",

Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Subhajit Purkayastha
I am using spark 2.0, have 2 DataFrames, SalesOrder and Forecast. I need to update the Forecast Dataframe record(s), based on the SaleOrder DF record. What is the best way to achieve this functionality

RE: Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Subhajit Purkayastha
I want Thanks for your help From: Mike Metzger [mailto:m...@flexiblecreations.com] Sent: Friday, August 26, 2016 2:12 PM To: Subhajit Purkayastha <spurk...@p3si.net> Cc: user @spark <user@spark.apache.org> Subject: Re: Spark 2.0 - Insert/Update to a DataFrame Without se

RE: Spark 2.0 - Insert/Update to a DataFrame

2016-08-26 Thread Subhajit Purkayastha
:13 PM To: Subhajit Purkayastha <spurk...@p3si.net> Cc: user @spark <user@spark.apache.org> Subject: Re: Spark 2.0 - Insert/Update to a DataFrame Without seeing the makeup of the Dataframes nor what your logic is for updating them, I'd suggest doing a join of the

DataFrame Data Manipulation - Based on a timestamp column Not Working

2016-08-23 Thread Subhajit Purkayastha
Using spark 2.0 & scala 2.11.8, I have a DataFrame with a timestamp column root |-- ORG_ID: integer (nullable = true) |-- HEADER_ID: integer (nullable = true) |-- ORDER_NUMBER: integer (nullable = true) |-- LINE_ID: integer (nullable = true) |-- LINE_NUMBER: integer (nullable = true)

Spark 2.0 - Join statement compile error

2016-08-22 Thread Subhajit Purkayastha
All, I have the following dataFrames and the temp table. I am trying to create a new DF , the following statement is not compiling val df = sales_demand.join(product_master,(sales_demand.INVENTORY_ITEM_ID==product_ma ster.INVENTORY_ITEM_ID),joinType="inner") What am I

New to spark 2.2.1 - Problem with finding tables between different metastore db

2018-02-06 Thread Subhajit Purkayastha
All, I am new to Spark 2.2.1. I have a single node cluster and also have enabled thriftserver for my Tableau application to connect to my persisted table. I feel that the spark cluster metastore is different from the thrift-server metastore. If this assumption is valid, what do I need to