subject:"Dataset count on database or parquet"

Re: Dataset count on database or parquet

2017-02-09 Thread Suresh Thalamati

If you have to get the data into parquet format for other reasons then I think count() on the parquet should be better. If it just the count you need using database sending dbTable = (select count(*) from ) might be quicker, t will avoid unnecessary data transfer from the database to

Dataset count on database or parquet

2017-02-08 Thread Rohit Verma

Hi Which of the following is better approach for too many values in database final Dataset dataset = spark.sqlContext().read() .format("jdbc") .option("url", params.getJdbcUrl()) .option("driver", params.getDriver())