Also, check the column names of df1 ( after joining df2 and df3 ).
Prasad.
From: Ted Yu
Date: Monday, April 25, 2016 at 8:35 PM
To: Divya Gehlot
Cc: "user @spark"
Subject: Re: Cant join same dataframe twice ?
Can you show us the structure of df2 and df3 ?
Thanks
On Mon, Apr 25, 2016 at 8:23
apache.org_jira_browse_SPARK-2D8560=CwICAg=fa_WZs7nNMvOIDyLmzi2sMVHyyC4hN9WQl29lWJQ5Y4=-5JY3iMOXXyFuBleKruCQ-6rGWyZEyiHu8ySSzJdEHw=4v0Ji1ymhcVi2Ys2mzOne0cuiDxWMiYmeRYVUeF3hWU=9L2ltekpwnC0BDcJPW43_ctNL_G4qTXN4EY2H_Ys0nU=
>
>
>Do you use dynamic allocation ?
>
>Cheers
>
>&g
I am joining two data frames as shown in the code below. This is throwing
NullPointerException.
I have a number of different join throughout the program and the SparkContext
throws this NullPointerException on a randomly on one of the joins.
The two data frames are very large data frames (
Hi,
I am running into performance issue when joining data frames created from avro
files using spark-avro library.
The data frames are created from 120K avro files and the total size is around
1.5 TB.
The two data frames are very huge with billions of records.
The join for these two
R_ID")
.withColumnRenamed("USER_CNTRY_ID","USER_DIM_COUNTRY_ID")
.as("userdim")
, userAndRetailDates("USER_ID") <=> $"userdim.USER_DIM_USER_ID"
&& userAndRetailDates("USER_CNTRY_ID") <=> $"us
Hi Anders,
I am running into the same issue as yours. I am trying to read about 120
thousand avro files into a single data frame.
Is your patch part of a pull request from the master branch in github?
Thanks,
Prasad.
From: Anders Arpteg
Date: Thursday, October 22, 2015 at 10:37 AM
To: Koert
Thanks, Koert.
Regards,
Prasad.
From: Koert Kuipers
Date: Thursday, December 17, 2015 at 1:06 PM
To: Prasad Ravilla
Cc: Anders Arpteg, user
Subject: Re: Large number of conf broadcasts
https://github.com/databricks/spark-avro/pull/95<https://urldefense.proofpoint.com/v2/url?u=ht