[jira] [Commented] (TINKERPOP-1876) I cannot use the bulkload with spark

ping (JIRA) Fri, 26 Jan 2018 05:15:07 -0800

    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341029#comment-16341029
 ]


ping commented on TINKERPOP-1876:
---------------------------------

thank you 
now I use the bulkload successfully but it looks like the program use a lot of 
memory . I have more than billion ip and hostname in the HDFS. when i use the 
sparkbulkload the operator system kill my proc! is there a method like hbase 
bulkload to load a big mount of dada ibto janusgraph? 
what is more my leader want to know the detail of the data in the hbase, such 
as what is the meaning of column family in the hbase where is the composite 
index in the hbaserow
the doc in the web should be more detail ! ok thank you again ! hope you will 
pardon my poor english


 
---Original---
From: "stephen mallette (JIRA)"<j...@apache.org>
Date: 2018/1/26 19:23:15
To: "970356792"<970356...@qq.com>;
Subject: [jira] [Commented] (TINKERPOP-1876) I cannot use the bulkload withspark



    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340932#comment-16340932
 ] 

stephen mallette commented on TINKERPOP-1876:
---------------------------------------------

It says here that you don't need a Google account to use Google Groups:

https://support.google.com/groups/answer/1067205?hl=en

You would just have to use the web forum to browse and post messages.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


> I cannot use the bulkload with spark
> ------------------------------------
>
>                 Key: TINKERPOP-1876
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1876
>             Project: TinkerPop
>          Issue Type: Test
>          Components: plugin
>    Affects Versions: 3.2.6
>            Reporter: ping
>            Priority: Major
>              Labels: bulkload
>
> here is the data
> {"timestamp":"1510335280","name":"sendv54sxu8f12g.ihance.net","type":"a","value":"52.52.81.55"}##
> {"timestamp":"1510338448","name":"*.2925.com.dycdn.com","type":"a","value":"121.201.116.57"}##
> {"timestamp":"1510308398","name":"*.2bask.com","type":"a","value":"176.31.246.156"}##
> {"timestamp":"1510350705","name":"*.5thlegdata.com","type":"a","value":"199.34.228.100"}##
> {"timestamp":"1510350937","name":"*.819.cn","type":"a","value":"118.190.84.164"}##
> {"timestamp":"1510301149","name":"*.acart.iii.com","type":"a","value":"66.171.203.156"}##
> {"timestamp":"1510337980","name":"*.aineistot.lamk.fi","type":"a","value":"193.166.79.79"}##
> {"timestamp":"1510344687","name":"*.amagervvs.dk","type":"a","value":"185.17.52.58"}##
> {"timestamp":"1510350321","name":"*.app-devel.services.actx.com","type":"a","value":"34.209.35.25"}##
> {"timestamp":"1510335280","name":"sendv54sxu8f12g.ihance.net","type":"a","value":"52.52.81.55"}
> {"timestamp":"1510338448","name":"*.2925.com.dycdn.com","type":"a","value":"121.201.116.57"}
> {"timestamp":"1510308398","name":"*.2bask.com","type":"a","value":"176.31.246.156"}
> {"timestamp":"1510350705","name":"*.5thlegdata.com","type":"a","value":"199.34.228.100"}
> {"timestamp":"1510350937","name":"*.819.cn","type":"a","value":"118.190.84.164"}
> {"timestamp":"1510301149","name":"*.acart.iii.com","type":"a","value":"66.171.203.156"}
> {"timestamp":"1510337980","name":"*.aineistot.lamk.fi","type":"a","value":"193.166.79.79"}
> {"timestamp":"1510344687","name":"*.amagervvs.dk","type":"a","value":"185.17.52.58"}
> {"timestamp":"1510350321","name":"*.app-devel.services.actx.com","type":"a","value":"34.209.35.25"}
> the bulk could load vertices successfully ,but when i make edges something 
> wrong happened the groovy script is
> def parse(line, factory) {
>         if (line.toString().contains("##")){
>             println "model1"
>             String 
> noquotes=line.replace("\"","").replace("\{","").replace("}","")
>             def (timestamp,host,type,ip)=noquotes.split(",")
>             String hostname=host.split(":")[1]
>             String ipdetail=ip.split(":")[1]
>             String time=timestamp.split(":")[1]
>             //def label = parts[1] != "" ? "person" : "address"
>             def v1 = factory.vertex(hostname, "host")
>             def v2 = factory.vertex(ipdetail)
>             def edge=factory.edge(v1, v2, "pointto")
>             edge.properties("timestamp",time)
>             return v1
>         }else{
>             println "model2"
>             String 
> noquotes=line.replace("\"","").replace("\{","").replace("}","")
>             def (timestamp,host,type,ip)=noquotes.split(",")
>             String hostname=host.split(":")[1]
>             String ipdetail=ip.split(":")[1]
>             String time=timestamp.split(":")[1]
>             //def label = parts[1] != "" ? "person" : "address"
>             def v1 = factory.vertex(ipdetail, "ip")
>             def v2 = factory.vertex(hostname)
>             def edge=factory.edge(v2, v1, "pointto")
>             edge.properties("timestamp",time)
>             return v1
>         }
>     }
>  
> error stack is
> Opened Graph instance: standardjanusgraph[hbase:[10.9.128.12]]
> java.util.NoSuchElementException
>     at 
> org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204)
>     at 
> org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoader.getVertexById(BulkLoader.java:118)
>     at 
> org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.lambda$executeInternal$4(BulkLoaderVertexProgram.java:251)
>     at java.util.Iterator.forEachRemaining(Iterator.java:116)
>     at 
> org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.executeInternal(BulkLoaderVertexProgram.java:249)
>     at 
> org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.execute(BulkLoaderVertexProgram.java:197)
>     at 
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$5(SparkExecutor.java:118)
>     at 
> org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247)
>     at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
>     at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
>     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>     at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:189)
>     at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
>     at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>     at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>     at org.apache.spark.scheduler.Task.run(Task.scala:89)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
>  
> this is the gremlin script
> graph = 
> GraphFactory.open("/user/janusgraph-0.2.0-hadoop2/conf/hadoop-graph/hadoop-script.properties")
> graph.configuration().setProperty("gremlin.hadoop.scriptInputFormat.script", 
> "host_ip2.groovy")     
> graph.configuration().setInputLocation("host_ip.json")
> blvp = 
> BulkLoaderVertexProgram.build().writeGraph("/tmp/10.properties").create(graph)
> graph.compute(SparkGraphComputer).workers(1).configure("fs.defaultFS", 
> "hdfs://am4:8020").program(blvp).submit().get()
>  
> is there any one can help me?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (TINKERPOP-1876) I cannot use the bulkload with spark

Reply via email to