[jira] [Closed] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration

2018-01-16 Thread flora karniav (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

flora karniav closed FLINK-8414.

Resolution: Feedback Received

> Gelly performance seriously decreases when using the suggested parallelism 
> configuration
> 
>
> Key: FLINK-8414
> URL: https://issues.apache.org/jira/browse/FLINK-8414
> Project: Flink
>  Issue Type: Bug
>  Components: Configuration, Documentation, Gelly
>Reporter: flora karniav
>Priority: Minor
>
> I am running Gelly examples with different datasets in a cluster of 5 
> machines (1 Jobmanager and 4 Taskmanagers) of 32 cores each.
> The number of Slots parameter is set to 32 (as suggested) and the parallelism 
> to 128 (32 cores*4 taskmanagers).
> I observe a vast performance degradation using these suggested settings than 
> setting parallelism.default to 16 for example were the same job completes at 
> ~60 seconds vs ~140 in the 128 parallelism case.
> Is there something wrong in my configuration? Should I decrease parallelism 
> and -if so- will this inevitably decrease CPU utilization?
> Another matter that may be related to this is the number of partitions of the 
> data. Is this somehow related to parallelism? How many partitions are created 
> in the case of parallelism.default=128? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration

2018-01-13 Thread flora karniav (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16325157#comment-16325157
 ] 

flora karniav commented on FLINK-8414:
--

Thank you for the information,

I understand the fact that lower parallelism levels are sufficient for these 
small datasets. But why would performance decrease with larger parallelism 
values? Due to this fact, I cannot measure performance using different datasets 
(with sizes that vary from MBs to GBs) with the same Flink setup and 
configuration.

In addition, even if I know the Graph size a priori (using VertexMetrics), is 
there a formula or some kind of standard way to decide the parallelism level 
accordingly? Or is brute force the only way?

Thank you 



> Gelly performance seriously decreases when using the suggested parallelism 
> configuration
> 
>
> Key: FLINK-8414
> URL: https://issues.apache.org/jira/browse/FLINK-8414
> Project: Flink
>  Issue Type: Bug
>  Components: Configuration, Documentation, Gelly
>Reporter: flora karniav
>Priority: Minor
>
> I am running Gelly examples with different datasets in a cluster of 5 
> machines (1 Jobmanager and 4 Taskmanagers) of 32 cores each.
> The number of Slots parameter is set to 32 (as suggested) and the parallelism 
> to 128 (32 cores*4 taskmanagers).
> I observe a vast performance degradation using these suggested settings than 
> setting parallelism.default to 16 for example were the same job completes at 
> ~60 seconds vs ~140 in the 128 parallelism case.
> Is there something wrong in my configuration? Should I decrease parallelism 
> and -if so- will this inevitably decrease CPU utilization?
> Another matter that may be related to this is the number of partitions of the 
> data. Is this somehow related to parallelism? How many partitions are created 
> in the case of parallelism.default=128? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration

2018-01-12 Thread flora karniav (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324228#comment-16324228
 ] 

flora karniav commented on FLINK-8414:
--

Thank you for your reply, 

I am running the ConnectedComponents and PageRank algorithms from Gelly 
examples on two SNAP datasets:

1) https://snap.stanford.edu/data/egonets-Twitter.html - 81,306 vertices and 
2,420,766 edges.
2) https://snap.stanford.edu/data/com-Youtube.html - 1,134,890 vertices and 
2,987,624 edges.

I also want to point out that I looked into CPU utilization when changing the 
parallelism level and it seems to grow as expected, however performance is 
still reduced. 

(I am sorry if I posted in an inappropriate section but thought of the issue 
bizarre enough to be configuration or bug-related.)

> Gelly performance seriously decreases when using the suggested parallelism 
> configuration
> 
>
> Key: FLINK-8414
> URL: https://issues.apache.org/jira/browse/FLINK-8414
> Project: Flink
>  Issue Type: Bug
>  Components: Configuration, Documentation, Gelly
>Reporter: flora karniav
>Priority: Minor
>
> I am running Gelly examples with different datasets in a cluster of 5 
> machines (1 Jobmanager and 4 Taskmanagers) of 32 cores each.
> The number of Slots parameter is set to 32 (as suggested) and the parallelism 
> to 128 (32 cores*4 taskmanagers).
> I observe a vast performance degradation using these suggested settings than 
> setting parallelism.default to 16 for example were the same job completes at 
> ~60 seconds vs ~140 in the 128 parallelism case.
> Is there something wrong in my configuration? Should I decrease parallelism 
> and -if so- will this inevitably decrease CPU utilization?
> Another matter that may be related to this is the number of partitions of the 
> data. Is this somehow related to parallelism? How many partitions are created 
> in the case of parallelism.default=128? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration

2018-01-11 Thread flora karniav (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

flora karniav updated FLINK-8414:
-
Description: 
I am running Gelly examples with different datasets in a cluster of 5 machines 
(1 Jobmanager and 4 Taskmanagers) of 32 cores each.

The number of Slots parameter is set to 32 (as suggested) and the parallelism 
to 128 (32 cores*4 taskmanagers).

I observe a vast performance degradation using these suggested settings than 
setting parallelism.default to 16 for example were the same job completes at 
~60 seconds vs ~140 in the 128 parallelism case.

Is there something wrong in my configuration? Should I decrease parallelism and 
-if so- will this inevitably decrease CPU utilization?

Another matter that may be related to this is the number of partitions of the 
data. Is this somehow related to parallelism? How many partitions are created 
in the case of parallelism.default=128? 

  was:
I am running Gelly examples with different datasets in a cluster of 5 machines 
(1 Jobmanager and 4 Taskmanagers) of 32 cores each.

The number of Slots parameter is set to 32 (as suggested) and the parallelism 
to 128 (32 cores*4 taskmanagers).

I observe a vast performance degradation using these suggested settings than 
setting parallelism.default to 16 for example were the same job completes at 37 
seconds vs 140 in the 128 parallelism case.

Is there something wrong in my configuration? Should I decrease parallelism and 
-if so- will this inevitably decrease CPU utilization?

Another matter that may be related to this is the number of partitions of the 
data. Is this somehow related to parallelism? How many partitions are created 
in the case of parallelism.default=128? 


> Gelly performance seriously decreases when using the suggested parallelism 
> configuration
> 
>
> Key: FLINK-8414
> URL: https://issues.apache.org/jira/browse/FLINK-8414
> Project: Flink
>  Issue Type: Bug
>  Components: Configuration, Documentation, Gelly
>Reporter: flora karniav
>Priority: Minor
>
> I am running Gelly examples with different datasets in a cluster of 5 
> machines (1 Jobmanager and 4 Taskmanagers) of 32 cores each.
> The number of Slots parameter is set to 32 (as suggested) and the parallelism 
> to 128 (32 cores*4 taskmanagers).
> I observe a vast performance degradation using these suggested settings than 
> setting parallelism.default to 16 for example were the same job completes at 
> ~60 seconds vs ~140 in the 128 parallelism case.
> Is there something wrong in my configuration? Should I decrease parallelism 
> and -if so- will this inevitably decrease CPU utilization?
> Another matter that may be related to this is the number of partitions of the 
> data. Is this somehow related to parallelism? How many partitions are created 
> in the case of parallelism.default=128? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration

2018-01-11 Thread flora karniav (JIRA)
flora karniav created FLINK-8414:


 Summary: Gelly performance seriously decreases when using the 
suggested parallelism configuration
 Key: FLINK-8414
 URL: https://issues.apache.org/jira/browse/FLINK-8414
 Project: Flink
  Issue Type: Bug
  Components: Configuration, Documentation, Gelly
Reporter: flora karniav
Priority: Minor


I am running Gelly examples with different datasets in a cluster of 5 machines 
(1 Jobmanager and 4 Taskmanagers) of 32 cores each.

The number of Slots parameter is set to 32 (as suggested) and the parallelism 
to 128 (32 cores*4 taskmanagers).

I observe a vast performance degradation using these suggested settings than 
setting parallelism.default to 16 for example were the same job completes at 37 
seconds vs 140 in the 128 parallelism case.

Is there something wrong in my configuration? Should I decrease parallelism and 
-if so- will this inevitably decrease CPU utilization?

Another matter that may be related to this is the number of partitions of the 
data. Is this somehow related to parallelism? How many partitions are created 
in the case of parallelism.default=128? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (FLINK-8403) Flink Gelly examples hanging without returning result

2018-01-11 Thread flora karniav (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

flora karniav closed FLINK-8403.

   Resolution: Not A Bug
Fix Version/s: 1.3.2

> Flink Gelly examples hanging without returning result
> -
>
> Key: FLINK-8403
> URL: https://issues.apache.org/jira/browse/FLINK-8403
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.3.2
> Environment: CentOS Linux release 7.3.1611
>Reporter: flora karniav
>  Labels: examples, gelly, performance
> Fix For: 1.3.2
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Hello, I am currently running and measuring Flink Gelly examples (Connected 
> components and Pagerank algorithms) with different SNAP datasets. When 
> running with the Twitter dataset for example 
> (https://snap.stanford.edu/data/egonets-Twitter.html) which has 81,306 
> vertices everything executes and finishes OK and I get the reported job 
> runtime. On the other hand,  executions with datasets having a bigger number 
> of vertices, e.g. https://snap.stanford.edu/data/com-Youtube.html with 
> 1,134,890 vertices, hang with no result and reported time, while at the same 
> time I get "Job execution switched to status FINISHED."
> I thought that this could be a memory issue so I reached 125GB of RAM 
> assigned to my taskmanagers (and jobmanager), but still no luck. 
> The exact command I am running is:
> ./bin/flink run examples/gelly/flink-gelly-examples_*.jar --algorithm 
> PageRank --directed false  --input_filename hdfs://sith0:9000/user/xx.txt 
> --input CSV --type integer --input_field_delimiter $' ' --output print



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-8403) Flink Gelly examples hanging without returning result

2018-01-11 Thread flora karniav (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321886#comment-16321886
 ] 

flora karniav commented on FLINK-8403:
--

Yes, that was it! I think it would be a good idea for this to be pointed out in 
the documentation. Thank you so much, I am closing the issue.

> Flink Gelly examples hanging without returning result
> -
>
> Key: FLINK-8403
> URL: https://issues.apache.org/jira/browse/FLINK-8403
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.3.2
> Environment: CentOS Linux release 7.3.1611
>Reporter: flora karniav
>  Labels: examples, gelly, performance
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Hello, I am currently running and measuring Flink Gelly examples (Connected 
> components and Pagerank algorithms) with different SNAP datasets. When 
> running with the Twitter dataset for example 
> (https://snap.stanford.edu/data/egonets-Twitter.html) which has 81,306 
> vertices everything executes and finishes OK and I get the reported job 
> runtime. On the other hand,  executions with datasets having a bigger number 
> of vertices, e.g. https://snap.stanford.edu/data/com-Youtube.html with 
> 1,134,890 vertices, hang with no result and reported time, while at the same 
> time I get "Job execution switched to status FINISHED."
> I thought that this could be a memory issue so I reached 125GB of RAM 
> assigned to my taskmanagers (and jobmanager), but still no luck. 
> The exact command I am running is:
> ./bin/flink run examples/gelly/flink-gelly-examples_*.jar --algorithm 
> PageRank --directed false  --input_filename hdfs://sith0:9000/user/xx.txt 
> --input CSV --type integer --input_field_delimiter $' ' --output print



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8403) Flink Gelly examples hanging without returning result

2018-01-10 Thread flora karniav (JIRA)
flora karniav created FLINK-8403:


 Summary: Flink Gelly examples hanging without returning result
 Key: FLINK-8403
 URL: https://issues.apache.org/jira/browse/FLINK-8403
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 1.3.2
 Environment: CentOS Linux release 7.3.1611
Reporter: flora karniav


Hello, I am currently running and measuring Flink Gelly examples (Connected 
components and Pagerank algorithms) with different SNAP datasets. When running 
with the Twitter dataset for example 
(https://snap.stanford.edu/data/egonets-Twitter.html) which has 81,306 vertices 
everything executes and finishes OK and I get the reported job runtime. On the 
other hand,  executions with datasets having a bigger number of vertices, e.g. 
https://snap.stanford.edu/data/com-Youtube.html with 1,134,890 vertices, hang 
with no result and reported time, while at the same time I get "Job execution 
switched to status FINISHED."

I thought that this could be a memory issue so I reached 125GB of RAM assigned 
to my taskmanagers (and jobmanager), but still no luck. 

The exact command I am running is:

./bin/flink run examples/gelly/flink-gelly-examples_*.jar --algorithm PageRank 
--directed false  --input_filename hdfs://sith0:9000/user/xx.txt --input CSV 
--type integer --input_field_delimiter $' ' --output print




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)