I try to load some rows from a big SQL table. Here is my code: === jdbcDF = sqlContext.read.format("jdbc").options( url="jdbc:postgresql://...", dbtable="mytable", partitionColumn="t", lowerBound=1451577600, upperBound=1454256000, numPartitions=1).load() print(jdbcDF.count()) ===
The code runs very slow because Spark tries to load whole table. I know there is a solution that uses subquery. I can use: dbtable="(SELECT * FROM mytable WHERE t>=1451577600 AND t<= 1454256000) tmp". However, it's still slow because the subquery creates a temp table. I would like to know how can I specify where filters so I don't need to load the whole table? >From spark source code I guess the filter in JDBCRelation is the solution I'm looking for. However, I don't know how to create a filter and pass it to jdbc driver. === https://github.com/apache/spark/blob/40ed2af587cedadc6e5249031857a922b3b234ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala === -- Thanks for help, Jyun-Fan Tsai --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org