[ https://issues.apache.org/jira/browse/SPARK-17366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-17366. ------------------------------- Resolution: Invalid Yes, please start on the user@ mailing list. > Temp tables cached in spark - Joins performance > ----------------------------------------------- > > Key: SPARK-17366 > URL: https://issues.apache.org/jira/browse/SPARK-17366 > Project: Spark > Issue Type: Brainstorming > Components: SQL > Environment: Amazon S3 > Reporter: Chris Sanjiv Xavier > Original Estimate: 120h > Remaining Estimate: 120h > > Hi , > I have a use case wherein we have SPARK running on an EC2 instance from > amazon . We are puling data from an S3 Bucket . We pull them into DF's and > then cache the tables . > We face a lot of performance issues when we try to Join the two tables which > have been cached. It runs really slowly. > Example of issue :- > Table A in memory 1000MB > Table B in memory 1000MB > Pulling data using SQL interface on Zeppelin UI notebook on Amazon. > Select * from table A inner join table B on A.column 1 = B.column 1 where > B.column 2 = 'SPARK' ; > The above query returns results extremely slowly . > This is a spark cluster with 6 nodes holding close to 250 GB memory in total. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org