[ https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873093#comment-15873093 ]
hosein edited comment on SPARK-19655 at 2/18/17 10:37 AM: ---------------------------------------------------------- I have a Vertica database with 100 million rows and I run this code in spark: df = spark.read.format("jdbc").option("url" , vertica_jdbc_url).option("dbtable", 'test_table') .option("user", "spark_user").option("password" , "password").load() result = df.filter(df['id'] > 100).count() print result I monitor queries in Vertica and spark code generates this query in Vertica: SELECT 1 FROM test_table WHERE ("id" > 100) this query returns about 100 million "1" and I think this is not suitable was (Author: hosein_ey): I have a Vertica database with 100 million rows and I run this code in spark: df = spark.read.format("jdbc").option("url" , vertica_jdbc_url).option("dbtable", 'test_table') .option("user", "spark_user").option("password" , "password").load() result = df.filter(df['id'] > 100).count() print result I monitor queries in Vertica and spark code generates this query in Vertica: SELECT 1 FROM test_table WHERE ("int_id" > 100) this query returns about 100 million "1" and I think this is not suitable > select count(*) , requests 1 for each row > ----------------------------------------- > > Key: SPARK-19655 > URL: https://issues.apache.org/jira/browse/SPARK-19655 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0 > Reporter: hosein > Priority: Minor > > when I want query select count( * ) by JDBC and monitor queries in database > side, I see spark requests: select 1 for destination table > it means 1 for each row and it is not optimized -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org