[jira] [Commented] (S2GRAPH-197) Provide S2graphSink for non-streaming dataset

Chul Kang (JIRA) Sun, 01 Apr 2018 20:59:09 -0700

    [ 
https://issues.apache.org/jira/browse/S2GRAPH-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421918#comment-16421918
 ]


Chul Kang commented on S2GRAPH-197:
-----------------------------------

If we need to write a large amount of data, it is much more useful to create a 
Hfile and load it into a cluster.
I think it would be better to merge GraphFileGenerator into writeBatch method 
of S2GraphSink.

 

I thought that it is simply possible using mutateGraphElement of S2graph Object 
in batch mode also.
The mutateGraphElement use hbase client API internally when using hbase 
storage. 
So it could be caused some performance issue to apply hbase storage and it can 
be a burden on the cluster.


The advantage of using mutateGraphElement in batch mode is that it can be used 
directly in any storage that implements mutate in s2graph
If you use it with proper throttling, it may be useful to read from a simple 
file and sink it into s2graph.

 

How about giving both ways as options?

> Provide S2graphSink for non-streaming dataset
> ---------------------------------------------
>
>                 Key: S2GRAPH-197
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-197
>             Project: S2Graph
>          Issue Type: Sub-task
>          Components: s2jobs
>            Reporter: Chul Kang
>            Assignee: Chul Kang
>            Priority: Major
>
> Currently, S2graphSink supports sink operation for spark structured streaming 
> that is only for StreamingQuery.
> If we provide the same operation for the DataframeWriter in S2graphSink, we 
> could use it in batch mode.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (S2GRAPH-197) Provide S2graphSink for non-streaming dataset

Reply via email to