Chris Sanjiv Xavier created SPARK-17366: -------------------------------------------
Summary: Temp tables cached in spark - Joins performance Key: SPARK-17366 URL: https://issues.apache.org/jira/browse/SPARK-17366 Project: Spark Issue Type: Brainstorming Components: SQL Environment: Amazon S3 Reporter: Chris Sanjiv Xavier Hi , I have a use case wherein we have SPARK running on an EC2 instance from amazon . We are puling data from an S3 Bucket . We pull them into DF's and then cache the tables . We face a lot of performance issues when we try to Join the two tables which have been cached. It runs really slowly. Example of issue :- Table A in memory 1000MB Table B in memory 1000MB Pulling data using SQL interface on Zeppelin UI notebook on Amazon. Select * from table A inner join table B on A.column 1 = B.column 1 where B.column 2 = 'SPARK' ; The above query returns results extremely slowly . This is a spark cluster with 6 nodes holding close to 250 GB memory in total. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org