[ https://issues.apache.org/jira/browse/S2GRAPH-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421918#comment-16421918 ]
Chul Kang commented on S2GRAPH-197: ----------------------------------- If we need to write a large amount of data, it is much more useful to create a Hfile and load it into a cluster. I think it would be better to merge GraphFileGenerator into writeBatch method of S2GraphSink. I thought that it is simply possible using mutateGraphElement of S2graph Object in batch mode also. The mutateGraphElement use hbase client API internally when using hbase storage. So it could be caused some performance issue to apply hbase storage and it can be a burden on the cluster. The advantage of using mutateGraphElement in batch mode is that it can be used directly in any storage that implements mutate in s2graph If you use it with proper throttling, it may be useful to read from a simple file and sink it into s2graph. How about giving both ways as options? > Provide S2graphSink for non-streaming dataset > --------------------------------------------- > > Key: S2GRAPH-197 > URL: https://issues.apache.org/jira/browse/S2GRAPH-197 > Project: S2Graph > Issue Type: Sub-task > Components: s2jobs > Reporter: Chul Kang > Assignee: Chul Kang > Priority: Major > > Currently, S2graphSink supports sink operation for spark structured streaming > that is only for StreamingQuery. > If we provide the same operation for the DataframeWriter in S2graphSink, we > could use it in batch mode. > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)