[jira] [Reopened] (SPARK-20352) PySpark SparkSession initialization take longer every iteration in a single application

2017-04-17 Thread hosein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hosein reopened SPARK-20352:


> PySpark SparkSession initialization take longer every iteration in a single 
> application
> ---
>
> Key: SPARK-20352
> URL: https://issues.apache.org/jira/browse/SPARK-20352
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 2.1.0
> Environment: Ubuntu 12
> Spark 2.1
> JRE 8.0
> Python 2.7
>Reporter: hosein
>
> I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. 
> Run spark-sumbit my_code.py without any additional configuration parameters.
> In a while loop I start SparkSession, analyze data and then stop the context 
> and this process repeats every 10 seconds.
> {code}
> while True:
> spark =   
> SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize' 
> , '5g').getOrCreate()
> sc = spark.sparkContext
> #some process and analyze
> spark.stop()
> {code}
> When program starts, it works perfectly.
> but when it works for many hours. spark initialization take long time. it 
> makes 10 or 20 seconds for just initializing spark.
> So what is the problem ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20352) PySpark SparkSession initialization take longer every iteration in a single application

2017-04-17 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970906#comment-15970906
 ] 

hosein commented on SPARK-20352:


I monitor execution time of every line in my code and this line:

spark =   
SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize' , 
'5g').getOrCreate()

take too long (20 or more seconds) if my code runs for hours.

> PySpark SparkSession initialization take longer every iteration in a single 
> application
> ---
>
> Key: SPARK-20352
> URL: https://issues.apache.org/jira/browse/SPARK-20352
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 2.1.0
> Environment: Ubuntu 12
> Spark 2.1
> JRE 8.0
> Python 2.7
>Reporter: hosein
>
> I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. 
> Run spark-sumbit my_code.py without any additional configuration parameters.
> In a while loop I start SparkSession, analyze data and then stop the context 
> and this process repeats every 10 seconds.
> {code}
> while True:
> spark =   
> SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize' 
> , '5g').getOrCreate()
> sc = spark.sparkContext
> #some process and analyze
> spark.stop()
> {code}
> When program starts, it works perfectly.
> but when it works for many hours. spark initialization take long time. it 
> makes 10 or 20 seconds for just initializing spark.
> So what is the problem ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20352) PySpark SparkSession initialization take longer every iteration in a single application

2017-04-16 Thread hosein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hosein updated SPARK-20352:
---
Environment: 
Ubuntu 12
Spark 2.1
JRE 8.0
Python 2.7


  was:
linux ubunto 12
spark 2.1
JRE 8.0



> PySpark SparkSession initialization take longer every iteration in a single 
> application
> ---
>
> Key: SPARK-20352
> URL: https://issues.apache.org/jira/browse/SPARK-20352
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 2.1.0
> Environment: Ubuntu 12
> Spark 2.1
> JRE 8.0
> Python 2.7
>Reporter: hosein
> Fix For: 2.1.0
>
>
> I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. 
> Run spark-sumbit my_code.py without any additional configuration parameters.
> In a while loop I start SparkSession, analyze data and then stop the context 
> and this process repeats every 10 seconds.
> #
> while True:
> spark =   
> SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize'
>  , '5g').getOrCreate()
> sc = spark.sparkContext
> #some process and analyze
> spark.stop()
> ###
> When program starts, it works perfectly.
> but when it works for many hours. spark initialization take long time. it 
> makes 10 or 20 seconds for just initializing spark.
> So what is the problem ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20352) PySpark SparkSession initialization take longer every iteration in a single application

2017-04-16 Thread hosein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hosein updated SPARK-20352:
---
Description: 
I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. Run 
spark-sumbit my_code.py without any additional configuration parameters.
In a while loop I start SparkSession, analyze data and then stop the context 
and this process repeats every 10 seconds.

{code}
while True:
spark =   
SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize' , 
'5g').getOrCreate()
sc = spark.sparkContext
#some process and analyze
spark.stop()
{code}

When program starts, it works perfectly.

but when it works for many hours. spark initialization take long time. it makes 
10 or 20 seconds for just initializing spark.

So what is the problem ?

  was:
I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. Run 
spark-sumbit my_code.py without any additional configuration parameters.
In a while loop I start SparkSession, analyze data and then stop the context 
and this process repeats every 10 seconds.

#
while True:
spark =   
SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize'
 , '5g').getOrCreate()
sc = spark.sparkContext
#some process and analyze
spark.stop()
###

When program starts, it works perfectly.

but when it works for many hours. spark initialization take long time. it makes 
10 or 20 seconds for just initializing spark.

So what is the problem ?


> PySpark SparkSession initialization take longer every iteration in a single 
> application
> ---
>
> Key: SPARK-20352
> URL: https://issues.apache.org/jira/browse/SPARK-20352
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 2.1.0
> Environment: Ubuntu 12
> Spark 2.1
> JRE 8.0
> Python 2.7
>Reporter: hosein
> Fix For: 2.1.0
>
>
> I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. 
> Run spark-sumbit my_code.py without any additional configuration parameters.
> In a while loop I start SparkSession, analyze data and then stop the context 
> and this process repeats every 10 seconds.
> {code}
> while True:
> spark =   
> SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize' 
> , '5g').getOrCreate()
> sc = spark.sparkContext
> #some process and analyze
> spark.stop()
> {code}
> When program starts, it works perfectly.
> but when it works for many hours. spark initialization take long time. it 
> makes 10 or 20 seconds for just initializing spark.
> So what is the problem ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20352) PySpark SparkSession initialization take longer every iteration in a single application

2017-04-16 Thread hosein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hosein updated SPARK-20352:
---
Environment: 
linux ubunto 12
spark 2.1
JRE 8.0


  was:
linux ubuntu 12
pyspark


> PySpark SparkSession initialization take longer every iteration in a single 
> application
> ---
>
> Key: SPARK-20352
> URL: https://issues.apache.org/jira/browse/SPARK-20352
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 2.1.0
> Environment: linux ubunto 12
> spark 2.1
> JRE 8.0
>Reporter: hosein
> Fix For: 2.1.0
>
>
> I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. 
> Run spark-sumbit my_code.py without any additional configuration parameters.
> In a while loop I start SparkSession, analyze data and then stop the context 
> and this process repeats every 10 seconds.
> #
> while True:
> spark =   
> SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize'
>  , '5g').getOrCreate()
> sc = spark.sparkContext
> #some process and analyze
> spark.stop()
> ###
> When program starts, it works perfectly.
> but when it works for many hours. spark initialization take long time. it 
> makes 10 or 20 seconds for just initializing spark.
> So what is the problem ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20352) PySpark SparkSession initialization take longer every iteration in a single application

2017-04-16 Thread hosein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hosein updated SPARK-20352:
---
Environment: 
linux ubuntu 12
pyspark

  was:
linux ubunto 12
pyspark


> PySpark SparkSession initialization take longer every iteration in a single 
> application
> ---
>
> Key: SPARK-20352
> URL: https://issues.apache.org/jira/browse/SPARK-20352
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 2.1.0
> Environment: linux ubuntu 12
> pyspark
>Reporter: hosein
> Fix For: 2.1.0
>
>
> I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. 
> Run spark-sumbit my_code.py without any additional configuration parameters.
> In a while loop I start SparkSession, analyze data and then stop the context 
> and this process repeats every 10 seconds.
> #
> while True:
> spark =   
> SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize'
>  , '5g').getOrCreate()
> sc = spark.sparkContext
> #some process and analyze
> spark.stop()
> ###
> When program starts, it works perfectly.
> but when it works for many hours. spark initialization take long time. it 
> makes 10 or 20 seconds for just initializing spark.
> So what is the problem ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20352) PySpark SparkSession initialization take longer every iteration in a single application

2017-04-16 Thread hosein (JIRA)
hosein created SPARK-20352:
--

 Summary: PySpark SparkSession initialization take longer every 
iteration in a single application
 Key: SPARK-20352
 URL: https://issues.apache.org/jira/browse/SPARK-20352
 Project: Spark
  Issue Type: Question
  Components: PySpark
Affects Versions: 2.1.0
 Environment: linux ubunto 12
pyspark
Reporter: hosein
 Fix For: 2.1.0


I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. Run 
spark-sumbit my_code.py without any additional configuration parameters.
In a while loop I start SparkSession, analyze data and then stop the context 
and this process repeats every 10 seconds.

#
while True:
spark =   
SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize'
 , '5g').getOrCreate()
sc = spark.sparkContext
#some process and analyze
spark.stop()
###

When program starts, it works perfectly.

but when it works for many hours. spark initialization take long time. it makes 
10 or 20 seconds for just initializing spark.

So what is the problem ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873140#comment-15873140
 ] 

hosein commented on SPARK-19655:


I think I should not use spark for my case...


> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873131#comment-15873131
 ] 

hosein commented on SPARK-19655:


if I want to count 100 million data, 100 million 1 returned over network for 
just count?

> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873121#comment-15873121
 ] 

hosein commented on SPARK-19655:


I surprised too : )
if you have Vertica database you can test this part of code and monitor queries 
in Vertica, in my experience, select 1 appered

> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873118#comment-15873118
 ] 

hosein commented on SPARK-19655:


how can I get count result from my Vertica table? is there  any optimized 
solution for do that ?

> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873110#comment-15873110
 ] 

hosein edited comment on SPARK-19655 at 2/18/17 11:11 AM:
--

I connect  to Vertica by JDBC and downloaded it's driver from this link:
https://my.vertica.com/download/vertica/client-drivers/

I supposed if I take JDBC driver jar file to Spark  and define JDBC url in my 
code, Spark works with this driver ...


was (Author: hosein_ey):
I connect  to Vertica by JDBC and downloaded it's driver from this link:
https://my.vertica.com/download/vertica/client-drivers/

I supposed if I take Spark JDBC  jar file and define JDBC url in it, Spark 
works with this driver ...

> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873110#comment-15873110
 ] 

hosein edited comment on SPARK-19655 at 2/18/17 11:09 AM:
--

I connect  to Vertica by JDBC and downloaded it's driver from this link:
https://my.vertica.com/download/vertica/client-drivers/

I supposed if I take Spark JDBC  jar file and define JDBC url in it, Spark 
works with this driver ...


was (Author: hosein_ey):
I connect  to Vertica by JDBC and downloaded it's driver from this link:
https://my.vertica.com/download/vertica/client-drivers/

I suppose if I take Spark JDBC  jar file and define JDBC url in it, Spark works 
with this driver ...

> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873110#comment-15873110
 ] 

hosein edited comment on SPARK-19655 at 2/18/17 11:07 AM:
--

I connect  to Vertica by JDBC and downloaded it's driver from this link:
https://my.vertica.com/download/vertica/client-drivers/

I suppose if I take Spark JDBC  jar file and define JDBC url in it, Spark works 
with this driver ...


was (Author: hosein_ey):
I connect  to Vertica by JDBC and downloaded it's driver from this link:
https://my.vertica.com/download/vertica/client-drivers/

I suppose if I take spark JDBC  jar file and define JDBC url in it, spark works 
with this driver ...

> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873110#comment-15873110
 ] 

hosein edited comment on SPARK-19655 at 2/18/17 11:06 AM:
--

I connect  to Vertica by JDBC and downloaded it's driver from this link:
https://my.vertica.com/download/vertica/client-drivers/

I suppose if I take spark JDBC  jar file and define JDBC url in it, spark works 
with this driver ...


was (Author: hosein_ey):
I connect  to Vertica by JDBC and downloaded it's driver from this link:
https://my.vertica.com/download/vertica/client-drivers/

I suppose if I take spark JDBC  jar file and define JDBC url in spark. spark 
works with this driver ...

> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873110#comment-15873110
 ] 

hosein edited comment on SPARK-19655 at 2/18/17 11:06 AM:
--

I connect  to Vertica by JDBC and downloaded it's driver from this link:
https://my.vertica.com/download/vertica/client-drivers/

I suppose if I take spark JDBC  jar file and define JDBC url in spark. spark 
works with this driver ...


was (Author: hosein_ey):
I connect  to Vertica by JDBC and downloaded it's driver from this link:
https://my.vertica.com/download/vertica/client-drivers/


> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873110#comment-15873110
 ] 

hosein commented on SPARK-19655:


I connect  to Vertica by JDBC and downloaded it's driver from this link:
https://my.vertica.com/download/vertica/client-drivers/


> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873093#comment-15873093
 ] 

hosein edited comment on SPARK-19655 at 2/18/17 10:37 AM:
--

I have a Vertica database with 100 million rows and I run this code in spark:

 df = spark.read.format("jdbc").option("url" , 
vertica_jdbc_url).option("dbtable", 'test_table')
   .option("user", "spark_user").option("password" , "password").load()

result = df.filter(df['id'] > 100).count()

print result

I monitor queries in Vertica and spark code generates this query in Vertica:

SELECT 1 FROM test_table WHERE ("id" > 100)

this query returns about 100 million "1" and I think this is not suitable









was (Author: hosein_ey):
I have a Vertica database with 100 million rows and I run this code in spark:

 df = spark.read.format("jdbc").option("url" , 
vertica_jdbc_url).option("dbtable", 'test_table')
   .option("user", "spark_user").option("password" , "password").load()

result = df.filter(df['id'] > 100).count()

print result

I monitor queries in Vertica and spark code generates this query in Vertica:

SELECT 1 FROM test_table WHERE ("int_id" > 100)

this query returns about 100 million "1" and I think this is not suitable








> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873093#comment-15873093
 ] 

hosein edited comment on SPARK-19655 at 2/18/17 10:36 AM:
--

I have a Vertica database with 100 million rows and I run this code in spark:

 df = spark.read.format("jdbc").option("url" , 
vertica_jdbc_url).option("dbtable", 'test_table')
   .option("user", "spark_user").option("password" , "password").load()

result = df.filter(df['id'] > 100).count()

print result

I monitor queries in Vertica and spark code generates this query in Vertica:

SELECT 1 FROM test_table WHERE ("int_id" > 100)

this query returns about 100 million "1" and I think this is not suitable









was (Author: hosein_ey):
I have a Vertica database with 100 million rows and I run this code in spark:

  
 df = spark.read.format("jdbc").option("url" , 
vertica_jdbc_url).option("dbtable", 'test_table')
   .option("user", "spark_user").option("password" , "password").load()

result = df.filter(df['id'] > 100).count()

print result
  

I monitor queries in Vertica and spark code generates this query in Vertica:

SELECT 1 FROM test_table WHERE ("int_id" > 100)

this query returns about 100 million "1" and I think this is not suitable








> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873093#comment-15873093
 ] 

hosein edited comment on SPARK-19655 at 2/18/17 10:36 AM:
--

I have a Vertica database with 100 million rows and I run this code in spark:

  
 df = spark.read.format("jdbc").option("url" , 
vertica_jdbc_url).option("dbtable", 'test_table')
   .option("user", "spark_user").option("password" , "password").load()

result = df.filter(df['id'] > 100).count()

print result
  

I monitor queries in Vertica and spark code generates this query in Vertica:

SELECT 1 FROM test_table WHERE ("int_id" > 100)

this query returns about 100 million "1" and I think this is not suitable









was (Author: hosein_ey):
I have a Vertica database with 100 million rows and I run this code in spark:
  
 df = spark.read.format("jdbc").option("url" , 
vertica_jdbc_url).option("dbtable", 'test_table')
   .option("user", "spark_user").option("password" , "password").load()
result = df.filter(df['id'] > 100).count()
print result


I monitor queries in Vertica and spark code generates this query in Vertica:

SELECT 1 FROM test_table WHERE ("int_id" > 100)

this query returns about 100 million "1" and I think this is not suitable








> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873093#comment-15873093
 ] 

hosein commented on SPARK-19655:


I have a Vertica database with 100 million rows and I run this code in spark:
  
 df = spark.read.format("jdbc").option("url" , 
vertica_jdbc_url).option("dbtable", 'test_table')
   .option("user", "spark_user").option("password" , "password").load()
result = df.filter(df['id'] > 100).count()
print result


I monitor queries in Vertica and spark code generates this query in Vertica:

SELECT 1 FROM test_table WHERE ("int_id" > 100)

this query returns about 100 million "1" and I think this is not suitable








> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hosein updated SPARK-19655:
---
Summary: select count(*) , requests 1 for each row  (was: select count(*) , 
requests 1 foreach row)

> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count(*) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19655) select count(*) , requests 1 for each row

2017-02-18 Thread hosein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hosein updated SPARK-19655:
---
Description: 
when I want query select count( * ) by JDBC and monitor queries in database 
side, I see spark requests: select 1 for destination table
it means 1 for each row and it is not optimized

  was:
when I want query select count(*) by JDBC and monitor queries in database side, 
I see spark requests: select 1 for destination table
it means 1 for each row and it is not optimized


> select count(*) , requests 1 for each row
> -
>
> Key: SPARK-19655
> URL: https://issues.apache.org/jira/browse/SPARK-19655
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: hosein
>Priority: Minor
>
> when I want query select count( * ) by JDBC and monitor queries in database 
> side, I see spark requests: select 1 for destination table
> it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19655) select count(*) , requests 1 foreach row

2017-02-18 Thread hosein (JIRA)
hosein created SPARK-19655:
--

 Summary: select count(*) , requests 1 foreach row
 Key: SPARK-19655
 URL: https://issues.apache.org/jira/browse/SPARK-19655
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: hosein
Priority: Minor


when I want query select count(*) by JDBC and monitor queries in database side, 
I see spark requests: select 1 for destination table
it means 1 for each row and it is not optimized



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org