Spark SQL poor join performance

2016-06-27 Thread Samo Sarajevo
I'm using SparkSQL to make fact table out of 5 dimensions. I'm facing performance issue (job is taking several hours to complete), and even after exhaustive googleing I see no solution. These are settings I have tried turing, but no sucess.  sqlContext.sql("set

Spark SQL poor join performance

2016-06-27 Thread vegass
"op.c2 = di.c2\n" + "AND o.name = op.c30\n" + "AND di.c3 = op.c3\n" + "AND di.c4 = op.c4").toSchemaRDD(); resultFact.count(); r