Re: Cant join same dataframe twice ?

2016-04-26 Thread Prasad Ravilla
Also, check the column names of df1 ( after joining df2 and df3 ). Prasad. From: Ted Yu Date: Monday, April 25, 2016 at 8:35 PM To: Divya Gehlot Cc: "user @spark" Subject: Re: Cant join same dataframe twice ? Can you show us the structure of df2 and df3 ? Thanks On Mon, Apr 25, 2016 at 8:23

Re: Negative Number of Active Tasks in Spark UI

2016-01-05 Thread Prasad Ravilla
apache.org_jira_browse_SPARK-2D8560=CwICAg=fa_WZs7nNMvOIDyLmzi2sMVHyyC4hN9WQl29lWJQ5Y4=-5JY3iMOXXyFuBleKruCQ-6rGWyZEyiHu8ySSzJdEHw=4v0Ji1ymhcVi2Ys2mzOne0cuiDxWMiYmeRYVUeF3hWU=9L2ltekpwnC0BDcJPW43_ctNL_G4qTXN4EY2H_Ys0nU= > > >Do you use dynamic allocation ? > >Cheers > >&g

DataFrame withColumnRenamed throwing NullPointerException

2016-01-05 Thread Prasad Ravilla
I am joining two data frames as shown in the code below. This is throwing NullPointerException. I have a number of different join throughout the program and the SparkContext throws this NullPointerException on a randomly on one of the joins. The two data frames are very large data frames (

Joining DataFrames - Causing Cartesian Product

2015-12-18 Thread Prasad Ravilla
Hi, I am running into performance issue when joining data frames created from avro files using spark-avro library. The data frames are created from 120K avro files and the total size is around 1.5 TB. The two data frames are very huge with billions of records. The join for these two

Re: Joining DataFrames - Causing Cartesian Product

2015-12-18 Thread Prasad Ravilla
R_ID") .withColumnRenamed("USER_CNTRY_ID","USER_DIM_COUNTRY_ID") .as("userdim") , userAndRetailDates("USER_ID") <=> $"userdim.USER_DIM_USER_ID" && userAndRetailDates("USER_CNTRY_ID") <=> $"us

Re: Large number of conf broadcasts

2015-12-17 Thread Prasad Ravilla
Hi Anders, I am running into the same issue as yours. I am trying to read about 120 thousand avro files into a single data frame. Is your patch part of a pull request from the master branch in github? Thanks, Prasad. From: Anders Arpteg Date: Thursday, October 22, 2015 at 10:37 AM To: Koert

Re: Large number of conf broadcasts

2015-12-17 Thread Prasad Ravilla
Thanks, Koert. Regards, Prasad. From: Koert Kuipers Date: Thursday, December 17, 2015 at 1:06 PM To: Prasad Ravilla Cc: Anders Arpteg, user Subject: Re: Large number of conf broadcasts https://github.com/databricks/spark-avro/pull/95<https://urldefense.proofpoint.com/v2/url?u=ht