[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906113#comment-14906113 ] ASF GitHub Bot commented on FLINK-1520: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1149 > Read edges and vertices from CSV files > -- > > Key: FLINK-1520 > URL: https://issues.apache.org/jira/browse/FLINK-1520 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri >Priority: Minor > Labels: easyfix, newbie > > Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900354#comment-14900354 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/1149#discussion_r39947900 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; + +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.types.NullValue; +import org.apache.flink.api.java.ExecutionEnvironment; + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with optional vertex and edge data. + * The class also configures the CSV readers used to read edge and vertex data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags, + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in the {@link org.apache.flink.api.java.io.CsvReader} class. + */ + +public class GraphCsvReader { + + @SuppressWarnings("unused") + private final Path vertexPath, edgePath; + private final ExecutionEnvironment executionContext; + protected CsvReader edgeReader; + protected CsvReader vertexReader; + protected MapFunction mapper; + protected Class vertexKey; + protected Class vertexValue; + protected Class edgeValue; + +// + public GraphCsvReader(Path vertexPath, Path edgePath, ExecutionEnvironment context) { + this.vertexPath = vertexPath; + this.edgePath = edgePath; + this.vertexReader = new CsvReader(vertexPath, context); + this.edgeReader = new CsvReader(edgePath, context); + this.mapper = null; + this.executionContext = context; + } + + public GraphCsvReader(Path edgePath, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.edgeReader = new CsvReader(edgePath, context); + this.vertexReader = null; + this.mapper = null; + this.executionContext = context; + } + + publicGraphCsvReader(Path edgePath, final MapFunction mapper, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.edgeReader = new CsvReader(edgePath, context); + this.vertexReader = null; + this.mapper = mapper; + this.executionContext = context; + } + + public GraphCsvReader (String edgePath, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, "The file path may not be null.")), context); + + } + + public GraphCsvReader(String vertexPath, String edgePath, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(vertexPath, "The file path may not be null.")), + new Path(Preconditions.checkNotNull(edgePath, "The file path may not be null.")), context); + } + + + public GraphCsvReader(String edgePath, final MapFunction mapper, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, "The file path may not be
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900361#comment-14900361 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/1149#discussion_r39948297 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/IncrementalSSSP.java --- @@ -110,24 +105,20 @@ public static void main(String [] args) throws Exception { // Emit results if(fileOutput) { resultedVertices.writeAsCsv(outputPath, "\n", ","); - - // since file sinks are lazy, we trigger the execution explicitly - env.execute("Incremental SSSP Example"); } else { resultedVertices.print(); } + env.execute("Incremental SSSP Example"); --- End diff -- I'm not sure whether I am missing something... Why do you add `env.execute()` after `print()`. It's no longer needed. Have a look here: https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph/PageRankBasic.java > Read edges and vertices from CSV files > -- > > Key: FLINK-1520 > URL: https://issues.apache.org/jira/browse/FLINK-1520 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri >Priority: Minor > Labels: easyfix, newbie > > Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900370#comment-14900370 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1149#discussion_r39948730 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/IncrementalSSSP.java --- @@ -110,24 +105,20 @@ public static void main(String [] args) throws Exception { // Emit results if(fileOutput) { resultedVertices.writeAsCsv(outputPath, "\n", ","); - - // since file sinks are lazy, we trigger the execution explicitly - env.execute("Incremental SSSP Example"); } else { resultedVertices.print(); } + env.execute("Incremental SSSP Example"); --- End diff -- That's result of auto-merge I guess. Thanks for spotting it! > Read edges and vertices from CSV files > -- > > Key: FLINK-1520 > URL: https://issues.apache.org/jira/browse/FLINK-1520 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri >Priority: Minor > Labels: easyfix, newbie > > Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900366#comment-14900366 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/1149#issuecomment-141905104 Hi @vasia, As you said, I already reviewed this :P. I left a couple of comments inline. Please reverify the forwarded fields annotations. If you put them there for one mapper, add them for the others too. Appart from that, it's good to merge. > Read edges and vertices from CSV files > -- > > Key: FLINK-1520 > URL: https://issues.apache.org/jira/browse/FLINK-1520 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri >Priority: Minor > Labels: easyfix, newbie > > Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900373#comment-14900373 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1149#issuecomment-141905875 Thanks @andralungu! I'll address your comments and merge later. > Read edges and vertices from CSV files > -- > > Key: FLINK-1520 > URL: https://issues.apache.org/jira/browse/FLINK-1520 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri >Priority: Minor > Labels: easyfix, newbie > > Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900368#comment-14900368 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1149#discussion_r39948692 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,486 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; + +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.types.NullValue; +import org.apache.flink.api.java.ExecutionEnvironment; + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with optional vertex and edge data. + * The class also configures the CSV readers used to read edge and vertex data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags, + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in the {@link org.apache.flink.api.java.io.CsvReader} class. + */ + +public class GraphCsvReader { + + @SuppressWarnings("unused") + private final Path vertexPath, edgePath; + private final ExecutionEnvironment executionContext; + protected CsvReader edgeReader; + protected CsvReader vertexReader; + protected MapFunction mapper; + protected Class vertexKey; + protected Class vertexValue; + protected Class edgeValue; + +// + public GraphCsvReader(Path vertexPath, Path edgePath, ExecutionEnvironment context) { + this.vertexPath = vertexPath; + this.edgePath = edgePath; + this.vertexReader = new CsvReader(vertexPath, context); + this.edgeReader = new CsvReader(edgePath, context); + this.mapper = null; + this.executionContext = context; + } + + public GraphCsvReader(Path edgePath, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.edgeReader = new CsvReader(edgePath, context); + this.vertexReader = null; + this.mapper = null; + this.executionContext = context; + } + + publicGraphCsvReader(Path edgePath, final MapFunction mapper, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.edgeReader = new CsvReader(edgePath, context); + this.vertexReader = null; + this.mapper = mapper; + this.executionContext = context; + } + + public GraphCsvReader (String edgePath, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, "The file path may not be null.")), context); + + } + + public GraphCsvReader(String vertexPath, String edgePath, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(vertexPath, "The file path may not be null.")), + new Path(Preconditions.checkNotNull(edgePath, "The file path may not be null.")), context); + } + + + public GraphCsvReader(String edgePath, final MapFunction mapper, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, "The file path may not be null.")),
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877297#comment-14877297 ] ASF GitHub Bot commented on FLINK-1520: --- GitHub user vasia opened a pull request: https://github.com/apache/flink/pull/1149 [FLINK-1520] [gelly] Create a Graph from CSV files This builds on @shghatge's work in #847. I addressed the remaining issues, rebased, and edited the docs. @andralungu, you've already reviewed this, but if you could give it one more look, that'd be great :) Thanks! You can merge this pull request into a Git repository by running: $ git pull https://github.com/vasia/flink csvInput Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1149.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1149 commit 46f52ae64664be39f73af2505e5ded5e9736a867 Author: ShivaniDate: 2015-06-17T13:37:36Z [FLINK-1520] [gelly] Read edges and vertices from CSV files commit ab114f39e9f1f21802ca63c8bb186f1015b8f460 Author: Shivani Date: 2015-07-06T13:41:59Z [FLINK-1520][gelly]Changed the methods for specifying types. Created a new file for tests. Made appropriate changes in gelly_guide.md commit 8a0b66489407de9aec84c3b715aded7225772ee4 Author: vasia Date: 2015-07-14T18:46:33Z [FLINK-1520] [gelly] types and formatting changes to the graph csv reader commit 8007acbf06649694429be189bab70aa451cee679 Author: vasia Date: 2015-07-27T13:43:59Z [FLINK-1520] [gelly] added named types methods for reading a Graph from CSV input, with and without vertex/edge values. Changes the examples and the tests accordingly. commit 9d02c2baba817948ff8710d2a2ae2dda752bff48 Author: vasia Date: 2015-09-19T19:18:53Z [FLINK-1520] [gelly] corrections in Javadocs; updated documentation > Read edges and vertices from CSV files > -- > > Key: FLINK-1520 > URL: https://issues.apache.org/jira/browse/FLINK-1520 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri >Priority: Minor > Labels: easyfix, newbie > > Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725140#comment-14725140 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge closed the pull request at: https://github.com/apache/flink/pull/847 > Read edges and vertices from CSV files > -- > > Key: FLINK-1520 > URL: https://issues.apache.org/jira/browse/FLINK-1520 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Shivani Ghatge >Priority: Minor > Labels: easyfix, newbie > > Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725142#comment-14725142 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-13651 @vasia It is fine with me. > Read edges and vertices from CSV files > -- > > Key: FLINK-1520 > URL: https://issues.apache.org/jira/browse/FLINK-1520 > Project: Flink > Issue Type: New Feature > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Shivani Ghatge >Priority: Minor > Labels: easyfix, newbie > > Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708092#comment-14708092 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-133727559 Thanks for the comments @andralungu! @shghatge, can you please close this PR? I will make the docs update and open a new one, which will include your work and my changes if that's OK with you. Thank you! Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682279#comment-14682279 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-130013958 Hi @vasia, Not sure whether this comment was issued for me... Nevertheless I left some suggestions inline. All in all, it covers the problems discussed in the 73! comments here. You forgot to properly document the edgeTypes(K, EV), etc methods. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682282#comment-14682282 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-130014300 Saw this I will also update the documentation afterwards... Sorry! Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635396#comment-14635396 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-123402206 I see your point @shghatge. However, I think naming just one method differently will be confusing.. If we're going to have custom method names, let's go with @andralungu's suggestion above and make sure we document these properly. I would prefer a bit shorter method names though. How about: 1). `keyType(K)` 2). `vertexTypes(K, VV)` 3). `edgeTypes(K, EV)` 4). `types(K, VV, EV)` ? Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634142#comment-14634142 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-123057981 The only problem with assuming NullValue if a value is missing is that we can't return NullValue in place of VV. I mean to say GraphK, VV, EV in this VV or EV can't be NullValue. otherwise that was what I was originally going for. Maybe since any of the other methods to create DataSet/Graph don't provide a method to give EdgeValue as NullValue and just expect the user to map it (at least that is what I saw), maybe we could just remove the functionality. I had only added it since many examples seemed to use it so I thought it would be nice to have that functionality. In any case we can just keep one typesNullEdge method too because if they don't want that, they can use normal overloaded types, 3 arguments for no NullValue, 2 arguments for null vertex and 1 argument for null vertex and edge and just one method named typesNullEdge to tell that only edges have NullValue. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626259#comment-14626259 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-121215586 yes, I mean `NullValue.class` :) I'd like to know @shghatge's opinion, too! Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626254#comment-14626254 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-121215203 Hmmm :-? but can you pass NullValue to tyes... it expects Something.class. Can it be overwritten without type erasure getting in the way? Anyway... I will let @shghatge take over from here :) Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624390#comment-14624390 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-120856171 Hi, I just had a closer look at this PR and it made me seriously question the utility of a `Graph.fromCSV` method. Why? First of all because it's more limited than the regular `env.fromCsv()` in the sense that it does not allow POJOs and it would be a bit tedious to support that. There would be a need for methods with 2 to n fields, according to the amount of attributes present in the POJO. Second, because, and I am speaking strictly as a user here, I would rather write: private static DataSetEdgeLong, Double getEdgesDataSet(ExecutionEnvironment env) { if(fileOutput) { return env.readCsvFile(edgeInputPath) .ignoreComments(#) .fieldDelimiter(\t) .lineDelimiter(\n) .types(Long.class, Long.class, Double.class) .map(new Tuple3ToEdgeMapLong, Double()); } else { return CommunityDetectionData.getDefaultEdgeDataSet(env); } } than... private static GraphLong, Long, Double getGraph(ExecutionEnvironment env) { GraphLong, Long, Double graph; if(!fileOutput) { DataSetEdgeLong, Double edges = CommunityDetectionData.getDefaultEdgeDataSet(env); graph = Graph.fromDataSet(edges, new MapFunctionLong, Long() { public Long map(Long label) { return label; } }, env); } else { graph = Graph.fromCsvReader(edgeInputPath,new MapFunctionLong, Long() { public Long map(Long label) { return label; } }, env).ignoreCommentsEdges(#) .fieldDelimiterEdges(\t) .lineDelimiterEdges(\n) .typesEdges(Long.class, Double.class) .typesVertices(Long.class, Long.class); } return graph; } Maybe it's just a preference thing... but I believe it's at least worth a discussion. On the other hand, the utility of such a method should have been questioned from its early Jira days, so I guess that's my mistake. I would like to hear your thoughts on this. Thanks! Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625209#comment-14625209 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-121033796 I hadn't realized that they would both need to be called in my previous comment, my bad. Any idea for decent method names? `typesNoEdgeValue` and `typesNoVertexValue` seem really ugly to me :S Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625201#comment-14625201 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-121032514 Hi @vasia, I also saw the types issue, but I had a feeling that this is the way it was decided in the previous comment. I would rather have different names for 2 and 3 than to force a call to `typeVertices` if it's not needed. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625177#comment-14625177 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-121025189 Hi @andralungu, do you mean support for POJOs as vertex / edge values? I guess that's a limitation we can't easily overcome, I agree. Still though, a nicely designed `fromCsv()` method would simplify the common case. As for the examples, I don't like what they currently look like in this PR either. However, that's not a problem of `fromCsv()`. The if-block can be easily simplified by changing `getDefaultEdgeDataSet` to `getDefaultGraph`. The else-block looks longer because of the mapper, which, in the current examples is in the main method. What I think is quite problematic, is the `types()` methods. Ideally, we would have the following: 1. `types(K)` : no vertex value, no edge value 2. `types(K, VV)`: no edge value 3. `types(K, EV)`: no vertex value 4. `types(K, VV, EV)`: both vertex and edge values are present However, because of type erasure, we can't have both 2 and 3. The current implementation (having separate `typesEdges` and `typesVertices`) means that both should always be called, even if not necessary. Another way would be to give 2 and 3 different names... So far I haven't been able to come up with a nice solution. Ideas? Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625250#comment-14625250 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-121039562 Yes, but then you would have the following methods: `types`, `typesNoEdgeValue`, `typesNoVertexValue` and again `types`. So, even if it's not 100% needed I'd try to keep it consistent. We could also make it more graph-oriented (the name `types` was generic). The following is just an example: 1). keyType(K) 2). keyAndVertexTypes(K, VV) 3). keyAndEdgeTypes(K, EV) 4). keyVertexAndEdgeTypes(K, VV, EV) With a nice documentation, I think I'd understand what these are for :) Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625246#comment-14625246 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r34504919 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,462 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +@SuppressWarnings({unused , unchecked}) +public class GraphCsvReaderK,VV,EV { + + private final Path vertexPath,edgePath; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + protected ClassK vertexKey; + protected ClassVV vertexValue; + protected ClassEV edgeValue; + +// + public GraphCsvReader(Path vertexPath,Path edgePath, ExecutionEnvironment context) { + this.vertexPath = vertexPath; + this.edgePath = edgePath; + this.VertexReader = new CsvReader(vertexPath,context); + this.EdgeReader = new CsvReader(edgePath,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath,final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String edgePath,ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + + } + + public GraphCsvReader(String vertexPath, String edgePath, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(vertexPath, The file path may not be null.)), + new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + } + + + public GraphCsvReader (String edgePath, final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)),mapper, context); + } +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625244#comment-14625244 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r34504887 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,462 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +@SuppressWarnings({unused , unchecked}) +public class GraphCsvReaderK,VV,EV { + + private final Path vertexPath,edgePath; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + protected ClassK vertexKey; + protected ClassVV vertexValue; + protected ClassEV edgeValue; + +// + public GraphCsvReader(Path vertexPath,Path edgePath, ExecutionEnvironment context) { + this.vertexPath = vertexPath; + this.edgePath = edgePath; + this.VertexReader = new CsvReader(vertexPath,context); + this.EdgeReader = new CsvReader(edgePath,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath,final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String edgePath,ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + + } + + public GraphCsvReader(String vertexPath, String edgePath, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(vertexPath, The file path may not be null.)), + new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + } + + + public GraphCsvReader (String edgePath, final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)),mapper, context); + } +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625425#comment-14625425 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-121072347 Still an overkill I think... Could another way be to have only `types(K, VV, EV)` with all 3 arguments and expect `NullValue` if a value is missing? Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619182#comment-14619182 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-119698681 Updated PR Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612516#comment-14612516 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33822653 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,471 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +public class GraphCsvReaderK,VV,EV { + + private final Path vertexPath,edgePath; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + +// + + public GraphCsvReader(Path vertexPath,Path edgePath, ExecutionEnvironment context) { + this.vertexPath = vertexPath; + this.edgePath = edgePath; + this.VertexReader = new CsvReader(vertexPath,context); + this.EdgeReader = new CsvReader(edgePath,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath,final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String edgePath,ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + + } + + public GraphCsvReader(String vertexPath, String edgePath, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(vertexPath, The file path may not be null.)),new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + } + + + public GraphCsvReader (String edgePath, final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)),mapper, context); + } + + public CsvReader getEdgeReader() { + return this.EdgeReader; + } + + public CsvReader getVertexReader() { + return this.VertexReader;
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612532#comment-14612532 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33823666 --- Diff: flink-staging/flink-gelly/src/test/java/org/apache/flink/graph/test/operations/GraphCreationITCase.java --- @@ -54,16 +75,13 @@ public void testCreateWithoutVertexValues() throws Exception { final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); GraphLong, NullValue, Long graph = Graph.fromDataSet(TestGraphUtils.getLongLongEdgeData(env), env); -DataSetVertexLong,NullValue data = graph.getVertices(); -ListVertexLong,NullValue result= data.collect(); - + graph.getVertices().writeAsCsv(resultPath); --- End diff -- hmm it seems you're reverting the changes of #863? Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612535#comment-14612535 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33823720 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,471 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +public class GraphCsvReaderK,VV,EV { + + private final Path vertexPath,edgePath; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + +// + + public GraphCsvReader(Path vertexPath,Path edgePath, ExecutionEnvironment context) { + this.vertexPath = vertexPath; + this.edgePath = edgePath; + this.VertexReader = new CsvReader(vertexPath,context); + this.EdgeReader = new CsvReader(edgePath,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath,final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String edgePath,ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + + } + + public GraphCsvReader(String vertexPath, String edgePath, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(vertexPath, The file path may not be null.)),new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + } + + + public GraphCsvReader (String edgePath, final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)),mapper, context); + } + + public CsvReader getEdgeReader() { + return this.EdgeReader; + } + + public CsvReader getVertexReader() { + return this.VertexReader;
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612538#comment-14612538 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33823858 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,471 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +public class GraphCsvReaderK,VV,EV { + + private final Path vertexPath,edgePath; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + +// + + public GraphCsvReader(Path vertexPath,Path edgePath, ExecutionEnvironment context) { + this.vertexPath = vertexPath; + this.edgePath = edgePath; + this.VertexReader = new CsvReader(vertexPath,context); + this.EdgeReader = new CsvReader(edgePath,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath,final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String edgePath,ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + + } + + public GraphCsvReader(String vertexPath, String edgePath, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(vertexPath, The file path may not be null.)),new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + } + + + public GraphCsvReader (String edgePath, final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)),mapper, context); + } + + public CsvReader getEdgeReader() { + return this.EdgeReader; + } + + public CsvReader getVertexReader() { + return this.VertexReader;
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612537#comment-14612537 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33823849 --- Diff: flink-staging/flink-gelly/src/test/java/org/apache/flink/graph/test/operations/GraphCreationWithMapperITCase.java --- @@ -52,16 +72,13 @@ public void testWithDoubleValueMapper() throws Exception { GraphLong, Double, Long graph = Graph.fromDataSet(TestGraphUtils.getLongLongEdgeData(env), new AssignDoubleValueMapper(), env); -DataSetVertexLong,Double data = graph.getVertices(); -ListVertexLong,Double result= data.collect(); - + graph.getVertices().writeAsCsv(resultPath); --- End diff -- Same here.. We changed the tests to use `collect()` instead of files in #863. Please don't change it back ;) Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612559#comment-14612559 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-118173012 Hi @shghatge! Thank you for the update :) I left some comments inline. There are still some formatting issues in the code. Please, carefully go through your changes and try to be consistent. Also, there are still several warning regarding types, unused annotations, unused variables. Can you please try to remove them? Your IDE should have a setting that gives you the list of warnings. Regarding the tests, better create new test files for your methods, since you need to test with files and currently other tests use `collect()`. Finally, I find the `types()` methods a bit confusing. Could we maybe have separate types methods for the vertices and edges? e.g. `typesEdges(keyType, valueType)`, `typesEdges(keyType)`, `typesVertices(keyType, valueType)` and `typesVertices(keyType)`? Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612513#comment-14612513 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33822502 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,471 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +public class GraphCsvReaderK,VV,EV { + + private final Path vertexPath,edgePath; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + +// + + public GraphCsvReader(Path vertexPath,Path edgePath, ExecutionEnvironment context) { + this.vertexPath = vertexPath; + this.edgePath = edgePath; + this.VertexReader = new CsvReader(vertexPath,context); + this.EdgeReader = new CsvReader(edgePath,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path edgePath,final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this.vertexPath = null; + this.edgePath = edgePath; + this.EdgeReader = new CsvReader(edgePath,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String edgePath,ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + + } + + public GraphCsvReader(String vertexPath, String edgePath, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(vertexPath, The file path may not be null.)),new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)), context); + } + + + public GraphCsvReader (String edgePath, final MapFunctionK, VV mapper, ExecutionEnvironment context) { + this(new Path(Preconditions.checkNotNull(edgePath, The file path may not be null.)),mapper, context); + } + + public CsvReader getEdgeReader() { + return this.EdgeReader; + } + + public CsvReader getVertexReader() { + return this.VertexReader;
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612545#comment-14612545 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33824099 --- Diff: flink-staging/flink-gelly/src/test/java/org/apache/flink/graph/test/operations/GraphCreationITCase.java --- @@ -54,16 +75,13 @@ public void testCreateWithoutVertexValues() throws Exception { final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); GraphLong, NullValue, Long graph = Graph.fromDataSet(TestGraphUtils.getLongLongEdgeData(env), env); -DataSetVertexLong,NullValue data = graph.getVertices(); -ListVertexLong,NullValue result= data.collect(); - + graph.getVertices().writeAsCsv(resultPath); --- End diff -- Oh... I made these changes before that pull request got merged. I change it now. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610786#comment-14610786 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-117787551 Nice and rebased. +1 Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610529#comment-14610529 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-117728162 Updated PR Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602859#comment-14602859 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33354597 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,502 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +public class GraphCsvReaderK,VV,EV{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { --- End diff -- again the bracket issue :) Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602861#comment-14602861 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33354654 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,502 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +public class GraphCsvReaderK,VV,EV{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunctionK, VV mapper, ExecutionEnvironment context) + { + this.path1=null; --- End diff -- here it's this,path1 = null; for consistency with the rest. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602863#comment-14602863 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33354695 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,502 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +public class GraphCsvReaderK,VV,EV{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunctionK, VV mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunctionK, VV mapper, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + } + + public CsvReader getEdgeReader() + { + return this.EdgeReader; + } + + public CsvReader getVertexReader() + { + return this.VertexReader; + } +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602879#comment-14602879 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-115689621 Hi @shghatge , I left my set of comments inline. They are mostly related to coding style issues. I guess you should revisit the previous comments here. Also, don't forget to rebase. It seems like there are some merge conflicts that need to be fixed :) Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602850#comment-14602850 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33353966 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -282,6 +282,54 @@ public void flatMap(EdgeK, EV edge, CollectorTuple1K out) { } /** + * Creates a graph from CSV files. + * + * Vertices with value are created from a CSV file with 2 fields + * Edges with value are created from a CSV file with 3 fields + * from Tuple3. --- End diff -- there is a trailing from Tuple3 here... Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602858#comment-14602858 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33354571 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,502 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +public class GraphCsvReaderK,VV,EV{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; --- End diff -- again, the path1, path2 issue Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602872#comment-14602872 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33355251 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GraphMetrics.java --- @@ -150,20 +149,15 @@ private static boolean parseParameters(String[] args) { } @SuppressWarnings(serial) - private static DataSetEdgeLong, NullValue getEdgesDataSet(ExecutionEnvironment env) { - if (fileOutput) { - return env.readCsvFile(edgesInputPath) - .lineDelimiter(\n).fieldDelimiter(\t) - .types(Long.class, Long.class).map( - new MapFunctionTuple2Long, Long, EdgeLong, NullValue() { - - public EdgeLong, NullValue map(Tuple2Long, Long value) { - return new EdgeLong, NullValue(value.f0, value.f1, - NullValue.getInstance()); - } - }); - } else { - return ExampleUtils.getRandomEdges(env, NUM_VERTICES); + private static GraphLong, NullValue, NullValue getGraph(ExecutionEnvironment env) { + if(fileOutput) { + return Graph.fromCsvReader(edgesInputPath, env).lineDelimiterEdges(\n).fieldDelimiterEdges(\t) + .types(Long.class); + + } + else + { + return Graph.fromDataSet(ExampleUtils.getRandomEdges(env, NUM_VERTICES), env); --- End diff -- Yup... so I like how this looks better than how the previous rewritings were made... Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602854#comment-14602854 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33354309 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,502 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +public class GraphCsvReaderK,VV,EV{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) --- End diff -- let's not call these path1 and path2. I suggest we use better names like edgePath, vertexPath... This is valid for the methods underneath too... Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602852#comment-14602852 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33354071 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -282,6 +282,54 @@ public void flatMap(EdgeK, EV edge, CollectorTuple1K out) { } /** + * Creates a graph from CSV files. + * + * Vertices with value are created from a CSV file with 2 fields + * Edges with value are created from a CSV file with 3 fields + * from Tuple3. + * + * @param verticesPath path to a CSV file with the Vertices data. + * @param edgesPath path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the --- End diff -- on which calling not which on ... or which on calling the types method specifies (not to specify) Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602857#comment-14602857 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33354535 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,502 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.types.NullValue; +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ +public class GraphCsvReaderK,VV,EV{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunctionK, VV mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { --- End diff -- Also, @vasia was talking about some general coding style rules in one of the previous comments... The way we add the opening block brackets must be consistent. So here, after public GraphCsvReader(...) { //open the bracket on the same line. Please look in the rest of the document for similar issues... Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602873#comment-14602873 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33355373 --- Diff: flink-staging/flink-gelly/src/test/java/org/apache/flink/graph/test/operations/GraphCreationWithMapperITCase.java --- @@ -156,4 +181,17 @@ public DummyCustomType map(Long vertexId) { return new DummyCustomType(vertexId.intValue()-1, false); } } + + private FileInputSplit createTempFile(String content) throws IOException { + File tempFile = File.createTempFile(test_contents, tmp); + tempFile.deleteOnExit(); --- End diff -- `deleteOnExit()`... nice! Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602853#comment-14602853 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33354136 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -282,6 +282,54 @@ public void flatMap(EdgeK, EV edge, CollectorTuple1K out) { } /** + * Creates a graph from CSV files. + * + * Vertices with value are created from a CSV file with 2 fields + * Edges with value are created from a CSV file with 3 fields + * from Tuple3. + * + * @param verticesPath path to a CSV file with the Vertices data. + * @param edgesPath path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + *Vertex ID, Vertex Value and Edge value returns a Graph + */ + public static GraphCsvReader fromCsvReader(String verticesPath, String edgesPath, ExecutionEnvironment context) { + return new GraphCsvReader(verticesPath, edgesPath, context); + } + /** Creates a graph from a CSV file for Edges., Vertices are --- End diff -- ... Edges. \n (right now it\s .,) Vertices Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602871#comment-14602871 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r33355180 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/ConnectedComponents.java --- @@ -119,23 +113,32 @@ private static boolean parseParameters(String [] args) { return true; } - @SuppressWarnings(serial) - private static DataSetEdgeLong, NullValue getEdgesDataSet(ExecutionEnvironment env) { - - if(fileOutput) { - return env.readCsvFile(edgeInputPath) - .ignoreComments(#) - .fieldDelimiter(\t) - .lineDelimiter(\n) - .types(Long.class, Long.class) - .map(new MapFunctionTuple2Long, Long, EdgeLong, NullValue() { - @Override - public EdgeLong, NullValue map(Tuple2Long, Long value) throws Exception { - return new EdgeLong, NullValue(value.f0, value.f1, NullValue.getInstance()); + private static GraphLong, Long, NullValue getGraph(ExecutionEnvironment env) + { + GraphLong, Long, NullValue graph; + if(!fileOutput) + { + DataSetEdgeLong, NullValue edges = ConnectedComponentsDefaultData.getDefaultEdgeDataSet(env); --- End diff -- Let's also keep this consistent. In Single Source Shortest Paths you read fromDataSet(getDefault..., env). maybe we could do that for all the examples Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599924#comment-14599924 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-114975168 Updated PR Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593342#comment-14593342 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-113483077 Mea culpa. No the mapper test is fine; For the examples comment, I meant to go through the classes in the example folder and to modify the way the graph is currently read. Right now, we fetch the edges via `env.fromCsv` and then use `Graph.fromDataSet` to create the graph. We should do it directly via Graph.fromCsv. The example in the docs is fine, because it explains how fromDataSet works. That is still available. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593241#comment-14593241 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-113439469 Hello @Andra There is one test for fromCsv with mapper in GraphCreationWithMapperITCase.java Should I add more tests for that? Also for the examples comment, do you mean that I should update the Gelly guide by removing the examples for Csv file which use env.readCsvFile();? I will add the other tests. :) Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591827#comment-14591827 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-113165614 Updated PR Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592072#comment-14592072 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-113213190 This looks very nice! Someone deserves a virtual :ice_cream: ! There are some tests missing: - test `fromCSV` with a Mapper - you just test `types`, `ignoreFirstLines` and `ignoreComments`; let's at least add tests for the `lineDelimiter*` and the `fieldDelimiter*` methods. I'm sure they work, but tests are written to guarantee that the functionality will also be there (at the same quality) in the future (i.e. some exotic code addition will not break it) :) I saw an outdated Vasia comment on an unused import; always hit mvn verify before pushing - it would have caught that :D Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592090#comment-14592090 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-113217602 Ah! And I just remembered! Maybe it makes sense to update the examples to use `fromCSV` when creating the Graph instead of `getEdgesDataSet`. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591644#comment-14591644 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-113107660 Hello @vasia I will follow the guidelines and add the tests that are suggested by you when making a commit. For the separate configuration methods issue, I was thinking more along the lines that if we want to configure the readers separately, then we could use the get methods for the CsvReaders and then configure them. But I will add the separate method now. Thanks for the detailed guidance. :) Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590556#comment-14590556 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32672805 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -282,6 +282,58 @@ public void flatMap(EdgeK, EV edge, CollectorTuple1K out) { } /** + * Creates a graph from CSV files. + * + * Vertices with value are created from a CSV file with 2 fields + * Edges with value are created from a CSV file with 3 fields + * from Tuple3. + * + * @param path1 path to a CSV file with the Vertices data. + * @param path2 path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + *Vertex ID, Vertex Value and Edge value returns a Graph + */ + + public static GraphCsvReader fromCsvReader(String path1, String path2, ExecutionEnvironment context){ + return (new GraphCsvReader(path1,path2,context)); --- End diff -- parentheses not needed Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590559#comment-14590559 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32672995 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -282,6 +282,58 @@ public void flatMap(EdgeK, EV edge, CollectorTuple1K out) { } /** + * Creates a graph from CSV files. + * + * Vertices with value are created from a CSV file with 2 fields + * Edges with value are created from a CSV file with 3 fields + * from Tuple3. + * + * @param path1 path to a CSV file with the Vertices data. + * @param path2 path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + *Vertex ID, Vertex Value and Edge value returns a Graph + */ + + public static GraphCsvReader fromCsvReader(String path1, String path2, ExecutionEnvironment context){ + return (new GraphCsvReader(path1,path2,context)); + } + /** Creates a graph from a CSV file for Edges., Vertices are + * induced from the edges. + * + * Edges with value are created from a CSV file with 3 fields. Vertices are created + * automatically and their values are set to NullValue. + * + * @param path a path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + * Vertex ID, Vertex Value and Edge value returns a Graph + */ + + public static GraphCsvReader fromCsvReader(String path, ExecutionEnvironment context){ + return (new GraphCsvReader(path,context)); + } + + /** +*Creates a graph from a CSV file for Edges., Vertices are +* induced from the edges and vertex values are calculated by a mapper +* function. Edges with value are created from a CSV file with 3 fields. +* Vertices are created automatically and their values are set by applying the provided map +* function to the vertex ids. +* +* @param path a path to a CSV file with the Edges data +* @param mapper the mapper function. +* @param context the flink execution environment. +* @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the +* Vertex ID, Vertex Value and Edge value returns a Graph +*/ + + public static GraphCsvReader fromCsvReader(String path, final MapFunction mapper,ExecutionEnvironment context) + { + return (new GraphCsvReader(path,mapper,context)); --- End diff -- same applies here :) Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590558#comment-14590558 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32672972 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -282,6 +282,58 @@ public void flatMap(EdgeK, EV edge, CollectorTuple1K out) { } /** + * Creates a graph from CSV files. + * + * Vertices with value are created from a CSV file with 2 fields + * Edges with value are created from a CSV file with 3 fields + * from Tuple3. + * + * @param path1 path to a CSV file with the Vertices data. + * @param path2 path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + *Vertex ID, Vertex Value and Edge value returns a Graph + */ + + public static GraphCsvReader fromCsvReader(String path1, String path2, ExecutionEnvironment context){ + return (new GraphCsvReader(path1,path2,context)); + } + /** Creates a graph from a CSV file for Edges., Vertices are + * induced from the edges. + * + * Edges with value are created from a CSV file with 3 fields. Vertices are created + * automatically and their values are set to NullValue. + * + * @param path a path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + * Vertex ID, Vertex Value and Edge value returns a Graph + */ + + public static GraphCsvReader fromCsvReader(String path, ExecutionEnvironment context){ + return (new GraphCsvReader(path,context)); --- End diff -- Parentheses here too. Also, it's nice to have a space after commas when separating arguments and a space before the curly bracket that defines the start of the method. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590580#comment-14590580 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32673548 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) --- End diff -- add type arguments to MapFunction Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590690#comment-14590690 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32679424 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590591#comment-14590591 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32674326 --- Diff: docs/libs/gelly_guide.md --- @@ -102,6 +102,15 @@ DataSetTuple3String, String, Double edgeTuples = env.readCsvFile(path/to/ed GraphString, Long, Double graph = Graph.fromTupleDataSet(vertexTuples, edgeTuples, env); {% endhighlight %} +* from a CSV file with three fields and an optional CSV file with 2 fields. In this case, Gelly will convert each row from the first CSV file to an `Edge`, where the first field will be the source ID, the second field will be the target ID and the third field will be the edge value. Equivalently, each row from the second CSV file will be converted to a `Vertex`, where the first field will be the vertex ID and the second field will be the vertex value. A types() method is called on the GraphCsvReader object returned by fromCsvReader() to inform the CsvReader of the types of the fields : --- End diff -- oh! Will fix this. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590599#comment-14590599 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32674875 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590607#comment-14590607 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32675011 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590606#comment-14590606 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32674993 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590605#comment-14590605 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32674967 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590601#comment-14590601 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32674908 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590557#comment-14590557 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32672827 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -282,6 +282,58 @@ public void flatMap(EdgeK, EV edge, CollectorTuple1K out) { } /** + * Creates a graph from CSV files. + * + * Vertices with value are created from a CSV file with 2 fields + * Edges with value are created from a CSV file with 3 fields + * from Tuple3. + * + * @param path1 path to a CSV file with the Vertices data. + * @param path2 path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + *Vertex ID, Vertex Value and Edge value returns a Graph + */ + + public static GraphCsvReader fromCsvReader(String path1, String path2, ExecutionEnvironment context){ + return (new GraphCsvReader(path1,path2,context)); + } + /** Creates a graph from a CSV file for Edges., Vertices are + * induced from the edges. + * + * Edges with value are created from a CSV file with 3 fields. Vertices are created + * automatically and their values are set to NullValue. + * + * @param path a path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + * Vertex ID, Vertex Value and Edge value returns a Graph + */ + --- End diff -- new line Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590554#comment-14590554 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32672778 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -282,6 +282,58 @@ public void flatMap(EdgeK, EV edge, CollectorTuple1K out) { } /** + * Creates a graph from CSV files. + * + * Vertices with value are created from a CSV file with 2 fields + * Edges with value are created from a CSV file with 3 fields + * from Tuple3. + * + * @param path1 path to a CSV file with the Vertices data. + * @param path2 path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + *Vertex ID, Vertex Value and Edge value returns a Graph + */ + --- End diff -- remove new line Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590563#comment-14590563 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32673105 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + --- End diff -- remove new lines Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590609#comment-14590609 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32675117 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590623#comment-14590623 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32676277 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590627#comment-14590627 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32676617 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; --- End diff -- I tried adding the function with types but it is giving error. But the methods for the mapper are tested and are working. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590633#comment-14590633 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/847#issuecomment-112954221 Hey @shghatge, this is a great first try, you got the logic right and I really like the detailed javadocs ^^ I left a few inline comments, which should be easy to fix. Let me also elaborate a bit on some general guidelines: - Code formatting: we don't really have a strict Java code style, but there a few things you can improve. For your code to be readable, it is nice to leave a space after the commas separating arguments. For example `myMethod(arg1, arg2, arg3)` instead of `myMethod(arg1,arg2,arg3)`. We usually separate the closing of a parenthesis and the opening of a curly bracket with a space, i.e. `myMethod() { ... }` instead of `myMethod(){ ... }`. Also, try to avoid adding new lines if they are not needed. Regarding the types missing, this is not creating an error, but gives a warning. You can turn on warning notification settings in your IDE to avoid this. - I like it that you added separate methods `includeFields` methods` for vertices and edges. It would probably make sense to do the same for the rest of the methods. For example, you might want to skip the first line in the edges file, but not in the vertices file. Right now, you are forced to either do both or none. Alternatively, we could add parameters to the existing methods, to define the behavior for edges and vertices files separately. For example `public GraphCsvReader lineDelimiter(String VertexDelimiter, EdgeDelimiter)`. What do you think? - Finally, in order to catch issues like the one with the null `VertexReader`, you should always try to test as much functionality you have added as possible. In this case, it would be a good idea to add a test reading from edges only and some tests for the different methods you have added. Let me know if you have questions! Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590561#comment-14590561 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32673063 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -282,6 +282,58 @@ public void flatMap(EdgeK, EV edge, CollectorTuple1K out) { } /** + * Creates a graph from CSV files. + * + * Vertices with value are created from a CSV file with 2 fields + * Edges with value are created from a CSV file with 3 fields + * from Tuple3. + * + * @param path1 path to a CSV file with the Vertices data. + * @param path2 path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + *Vertex ID, Vertex Value and Edge value returns a Graph + */ + + public static GraphCsvReader fromCsvReader(String path1, String path2, ExecutionEnvironment context){ --- End diff -- I would rename `path1` and `path2` to something like `verticesPath` and `edgesPath` Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590603#comment-14590603 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32674930 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590604#comment-14590604 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32674944 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590602#comment-14590602 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32674923 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590575#comment-14590575 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32673330 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -282,6 +282,58 @@ public void flatMap(EdgeK, EV edge, CollectorTuple1K out) { } /** + * Creates a graph from CSV files. + * + * Vertices with value are created from a CSV file with 2 fields + * Edges with value are created from a CSV file with 3 fields + * from Tuple3. + * + * @param path1 path to a CSV file with the Vertices data. + * @param path2 path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + *Vertex ID, Vertex Value and Edge value returns a Graph + */ + + public static GraphCsvReader fromCsvReader(String path1, String path2, ExecutionEnvironment context){ + return (new GraphCsvReader(path1,path2,context)); + } + /** Creates a graph from a CSV file for Edges., Vertices are + * induced from the edges. + * + * Edges with value are created from a CSV file with 3 fields. Vertices are created + * automatically and their values are set to NullValue. + * + * @param path a path to a CSV file with the Edges data + * @param context the flink execution environment. + * @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the + * Vertex ID, Vertex Value and Edge value returns a Graph + */ + + public static GraphCsvReader fromCsvReader(String path, ExecutionEnvironment context){ + return (new GraphCsvReader(path,context)); + } + + /** +*Creates a graph from a CSV file for Edges., Vertices are +* induced from the edges and vertex values are calculated by a mapper +* function. Edges with value are created from a CSV file with 3 fields. +* Vertices are created automatically and their values are set by applying the provided map +* function to the vertex ids. +* +* @param path a path to a CSV file with the Edges data +* @param mapper the mapper function. +* @param context the flink execution environment. +* @return An instance of {@link org.apache.flink.graph.GraphCsvReader} , which on calling types() method to specify types of the +* Vertex ID, Vertex Value and Edge value returns a Graph +*/ + + public static GraphCsvReader fromCsvReader(String path, final MapFunction mapper,ExecutionEnvironment context) --- End diff -- add type arguments to MapFunction Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590614#comment-14590614 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32675480 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590566#comment-14590566 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32673158 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; --- End diff -- unused import Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590572#comment-14590572 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32673274 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; --- End diff -- add type arguments to MapFunction Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590568#comment-14590568 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32673221 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; --- End diff -- `edgePath` and `vertexPath` also seem to be unused Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590584#comment-14590584 ] ASF GitHub Bot commented on FLINK-1520: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/847#discussion_r32673806 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/GraphCsvReader.java --- @@ -0,0 +1,388 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph; +import com.google.common.base.Preconditions; +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.io.CsvReader; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.core.fs.Path; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; +import org.apache.flink.core.fs.Path; + + +/** + * A class to build a Graph using path(s) provided to CSV file(s) with edge (vertices) data + * The class also configures the CSV readers used to read edges(vertices) data such as the field types, + * the delimiters (row and field), the fields that should be included or skipped, and other flags + * such as whether to skip the initial line as the header. + * The configuration is done using the functions provided in The {@link org.apache.flink.api.java.io.CsvReader} class. + */ + + + +public class GraphCsvReader{ + + private final Path path1,path2; + private final ExecutionEnvironment executionContext; + + private Path edgePath; + private Path vertexPath; + protected CsvReader EdgeReader; + protected CsvReader VertexReader; + protected MapFunction mapper; + +// + + public GraphCsvReader(Path path1,Path path2, ExecutionEnvironment context) + { + this.path1 = path1; + this.path2 = path2; + this.VertexReader = new CsvReader(path1,context); + this.EdgeReader = new CsvReader(path2,context); + this.mapper=null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = null; + this.executionContext=context; + } + + public GraphCsvReader(Path path2,final MapFunction mapper, ExecutionEnvironment context) + { + this.path1=null; + this.path2 = path2; + this.EdgeReader = new CsvReader(path2,context); + this.VertexReader = null; + this.mapper = mapper; + this.executionContext=context; + } + + public GraphCsvReader (String path2,ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + + } + + public GraphCsvReader(String path1, String path2, ExecutionEnvironment context) + { + this(new Path(Preconditions.checkNotNull(path1, The file path may not be null.)),new Path(Preconditions.checkNotNull(path2, The file path may not be null.)), context); + } + + + public GraphCsvReader (String path2, final MapFunction mapper, ExecutionEnvironment context) + { + + this(new Path(Preconditions.checkNotNull(path2, The file path may not be null.)),mapper, context); + + + } + + public CsvReader getEdgeReader() + { +
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589674#comment-14589674 ] ASF GitHub Bot commented on FLINK-1520: --- GitHub user shghatge opened a pull request: https://github.com/apache/flink/pull/846 Csv read graph [FLINK-1520]Read edges and vertices from CSV files Changes done- 1) Added a GraphCsvReader class which has 2 CsvReaders as members EdgeReader and VertexReader To make smooth chaining of functions possible for configuration of the member CsvReaders implemented the configuration methods in CsvReader in GraphCsvReader so that all the configurations can be done on both CsvReaders on calling the function once and the methods again return a GraphCsvReader Only the methods to specify which fields are supposed to be chosen from the individual files are separate for Edge and Vertex reader. Since specifying types is necessary because of type-erasure, implemented a types method in the GraphCsvReader class that returns a Graph with the specified types as the type for VertexID, Vertex Value and Edge Value. Other way for doing this was sending the types in a method to construct the graph itself but to make it as similar to CsvReader as possible this approach was taken. 2) Added 3 methods in Graph.java similar to other methods for Graph creation. These methods use one mandatory path and one optional path and optional mapper for Graph Creation. Only difference is that these methods return an instance of GraphCsvReader instead of Graph. 3)Added appropriate methods in GraphCreationITCase and GraphCreationWithMapperITCase,java Also added createTempFile() method to both to help with the tests. 4) Added the documentation for the new functionalities to gelly_guide.md 3) You can merge this pull request into a Git repository by running: $ git pull https://github.com/shghatge/flink csv_readGraph Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/846.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #846 commit e8a5250b4588326b606b2f29d2f2c2f6e4554925 Author: Shivani shgha...@gmail.com Date: 2015-06-10T11:22:37Z [FLINK-2093][gelly] Added difference Method commit b0a9228540fbafe819d1883e07087c4f59f4e4bb Author: Shivani shgha...@gmail.com Date: 2015-06-10T13:04:48Z [FLINK-2093][gelly] Minor Changes in the Graph.java file commit 24c7567b6fe533cd1a7cff1d9bb812ee5d8433fc Author: Shivani shgha...@gmail.com Date: 2015-06-15T10:37:00Z Merge branch 'master' of https://github.com/apache/flink into difference_new commit e726507d41a58abbc9a6abe2ead4e9e83b09 Author: Shivani shgha...@gmail.com Date: 2015-06-15T12:13:58Z [FLINK-2093][gelly]Added difference method commit 760047dc78739b9eb750757aea442aa947c2fc34 Author: Shivani shgha...@gmail.com Date: 2015-06-15T13:01:32Z [FLINK-2093][gelly]Added difference method commit 57f1b315f7fbb87c74085c9a68108fcd3ff58440 Author: Shivani shgha...@gmail.com Date: 2015-06-15T14:09:29Z [FLINK-2093][gelly]Added difference method commit 9ca5d7485708c3dca7e41bf6d19b5bd9d492125f Author: Shivani shgha...@gmail.com Date: 2015-06-15T14:12:50Z [FLINK-2093][gelly]Added difference method commit eff14eff8c1ac1b88a76438257ad74c7a004bbd3 Author: Shivani shgha...@gmail.com Date: 2015-06-16T16:37:36Z [FLINK-1520]Read edges and vertices from CSV files commit 1ae9eadc792500311c7fc24a8647364bc60902ec Author: Shivani shgha...@gmail.com Date: 2015-06-16T16:38:20Z [FLINK-1520]Read edges and vertices from CSV files commit c5f4410acf30d3d0bd12c4f215005aa517938dd2 Author: Shivani shgha...@gmail.com Date: 2015-06-16T16:39:39Z Merge branch 'master' of https://github.com/apache/flink into csv_readGraph commit 342225af5f17fa7be906885ee3dcd1b2f6a6d176 Author: Shivani shgha...@gmail.com Date: 2015-06-17T11:39:48Z [FLINK-1520]Read edges and vertices from CSV files commit 553e676003ee1419710146ddbe0a13e17fd3d237 Author: Shivani shgha...@gmail.com Date: 2015-06-17T11:42:26Z [FLINK-1520]Read edges and vertices from CSV files Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589770#comment-14589770 ] ASF GitHub Bot commented on FLINK-1520: --- Github user shghatge closed the pull request at: https://github.com/apache/flink/pull/846 Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589722#comment-14589722 ] ASF GitHub Bot commented on FLINK-1520: --- Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/846#issuecomment-112783592 Hi @shghatge , It seems we have a bit of a mess in this PR. Nothing that cannot be fixed. Let's take it offline. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589774#comment-14589774 ] ASF GitHub Bot commented on FLINK-1520: --- GitHub user shghatge opened a pull request: https://github.com/apache/flink/pull/847 [FLINK-1520]Read edges and vertices from CSV files [FLINK-1520]Read edges and vertices from CSV files Changes done- 1) Added a GraphCsvReader class which has 2 CsvReaders as members EdgeReader and VertexReader To make smooth chaining of functions possible for configuration of the member CsvReaders implemented the configuration methods in CsvReader in GraphCsvReader so that all the configurations can be done on both CsvReaders on calling the function once and the methods again return a GraphCsvReader Only the methods to specify which fields are supposed to be chosen from the individual files are separate for Edge and Vertex reader. Since specifying types is necessary because of type-erasure, implemented a types method in the GraphCsvReader class that returns a Graph with the specified types as the type for VertexID, Vertex Value and Edge Value. Other way for doing this was sending the types in a method to construct the graph itself but to make it as similar to CsvReader as possible this approach was taken. 2) Added 3 methods in Graph.java similar to other methods for Graph creation. These methods use one mandatory path and one optional path and optional mapper for Graph Creation. Only difference is that these methods return an instance of GraphCsvReader instead of Graph. 3)Added appropriate methods in GraphCreationITCase and GraphCreationWithMapperITCase,java Also added createTempFile() method to both to help with the tests. 4) Added the documentation for the new functionalities to gelly_guide.md Closed the previous pull request and made a new one with a fresh branch because the previous changes are not merged yet. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shghatge/flink csv_clear_pull Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/847.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #847 commit b7c1079f9fe56a2586f36f8b5eca5208b33e9cf8 Author: Shivani shgha...@gmail.com Date: 2015-06-17T13:37:36Z [FLINK-1520]Read edges and vertices from CSV files Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584874#comment-14584874 ] Andra Lungu commented on FLINK-1520: Yup, I am not happy with the argument passing, as it may be cumbersome for the user to get what each argument means etc. I thought about this approach, my only concern is that it will introduce a ton of duplicate code. And, in the end, you write (more or less) the same commands, just that instead of getting a DataSet, which you then turn into a graph with fromDataSet, you get a graph directly... If we are okay with code duplication then I would +1 Vasia's solution. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584886#comment-14584886 ] Vasia Kalavri commented on FLINK-1520: -- I don't think it'll be a lot of duplicate code. You can have EdgeCsvReader wrap a CsvReader and just call its methods, no? Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584586#comment-14584586 ] Andra Lungu commented on FLINK-1520: Hey [~vkalavri]], To my knowledge, you cannot deduce the key or the value's class from the generic K,VV,EV. The way I would implement fromCsv is by adding the classes as parameters, e.g. Graph.fromCsv(edgesPath, String.class, String.class, context). For NullValue, then, we would have a single class argument Graph.fromCsv(edgesPath, String.class, context). The user should know what kind of keys he/she has in there. So the extra parameters should not be that much of a burden. Is this what you had in mind? For the time being, I cannot see a smarter way of doing it :) The examples should be updated accordingly since they now read the edge and vertex data sets from CSV and then use fromDataSet to produce the graph. Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Shivani Ghatge Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570586#comment-14570586 ] Vasia Kalavri commented on FLINK-1520: -- Hey [~cebe]! One more ping to you :) If you're not working on this, can I release this issue? Thanks! Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Carsten Brandt Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363270#comment-14363270 ] Vasia Kalavri commented on FLINK-1520: -- Hi [~cebe]! Are you working on this? If you're stuck and need some help, let us know! Also, if you're simply too busy and can't currently work on this :-) Thanks! Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Carsten Brandt Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332277#comment-14332277 ] Carsten Brandt commented on FLINK-1520: --- [~rmetzger], thanks it is working! Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Carsten Brandt Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319956#comment-14319956 ] Carsten Brandt commented on FLINK-1520: --- [~vkalavri] you can assign me to this, will try to work on this next week. https://github.com/project-flink/flink-graph/pull/64#issuecomment-73885671 Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
[ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319970#comment-14319970 ] Vasia Kalavri commented on FLINK-1520: -- Perfect! It's all yours :) Read edges and vertices from CSV files -- Key: FLINK-1520 URL: https://issues.apache.org/jira/browse/FLINK-1520 Project: Flink Issue Type: New Feature Components: Gelly Reporter: Vasia Kalavri Assignee: Carsten Brandt Priority: Minor Labels: easyfix, newbie Add methods to create Vertex and Edge Datasets directly from CSV file inputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)