[ https://issues.apache.org/jira/browse/TINKERPOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341029#comment-16341029 ]
ping commented on TINKERPOP-1876: --------------------------------- thank you now I use the bulkload successfully but it looks like the program use a lot of memory . I have more than billion ip and hostname in the HDFS. when i use the sparkbulkload the operator system kill my proc! is there a method like hbase bulkload to load a big mount of dada ibto janusgraph? what is more my leader want to know the detail of the data in the hbase, such as what is the meaning of column family in the hbase where is the composite index in the hbaserow the doc in the web should be more detail ! ok thank you again ! hope you will pardon my poor english ---Original--- From: "stephen mallette (JIRA)"<j...@apache.org> Date: 2018/1/26 19:23:15 To: "970356792"<970356...@qq.com>; Subject: [jira] [Commented] (TINKERPOP-1876) I cannot use the bulkload withspark [ https://issues.apache.org/jira/browse/TINKERPOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340932#comment-16340932 ] stephen mallette commented on TINKERPOP-1876: --------------------------------------------- It says here that you don't need a Google account to use Google Groups: https://support.google.com/groups/answer/1067205?hl=en You would just have to use the web forum to browse and post messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) > I cannot use the bulkload with spark > ------------------------------------ > > Key: TINKERPOP-1876 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1876 > Project: TinkerPop > Issue Type: Test > Components: plugin > Affects Versions: 3.2.6 > Reporter: ping > Priority: Major > Labels: bulkload > > here is the data > {"timestamp":"1510335280","name":"sendv54sxu8f12g.ihance.net","type":"a","value":"52.52.81.55"}## > {"timestamp":"1510338448","name":"*.2925.com.dycdn.com","type":"a","value":"121.201.116.57"}## > {"timestamp":"1510308398","name":"*.2bask.com","type":"a","value":"176.31.246.156"}## > {"timestamp":"1510350705","name":"*.5thlegdata.com","type":"a","value":"199.34.228.100"}## > {"timestamp":"1510350937","name":"*.819.cn","type":"a","value":"118.190.84.164"}## > {"timestamp":"1510301149","name":"*.acart.iii.com","type":"a","value":"66.171.203.156"}## > {"timestamp":"1510337980","name":"*.aineistot.lamk.fi","type":"a","value":"193.166.79.79"}## > {"timestamp":"1510344687","name":"*.amagervvs.dk","type":"a","value":"185.17.52.58"}## > {"timestamp":"1510350321","name":"*.app-devel.services.actx.com","type":"a","value":"34.209.35.25"}## > {"timestamp":"1510335280","name":"sendv54sxu8f12g.ihance.net","type":"a","value":"52.52.81.55"} > {"timestamp":"1510338448","name":"*.2925.com.dycdn.com","type":"a","value":"121.201.116.57"} > {"timestamp":"1510308398","name":"*.2bask.com","type":"a","value":"176.31.246.156"} > {"timestamp":"1510350705","name":"*.5thlegdata.com","type":"a","value":"199.34.228.100"} > {"timestamp":"1510350937","name":"*.819.cn","type":"a","value":"118.190.84.164"} > {"timestamp":"1510301149","name":"*.acart.iii.com","type":"a","value":"66.171.203.156"} > {"timestamp":"1510337980","name":"*.aineistot.lamk.fi","type":"a","value":"193.166.79.79"} > {"timestamp":"1510344687","name":"*.amagervvs.dk","type":"a","value":"185.17.52.58"} > {"timestamp":"1510350321","name":"*.app-devel.services.actx.com","type":"a","value":"34.209.35.25"} > the bulk could load vertices successfully ,but when i make edges something > wrong happened the groovy script is > def parse(line, factory) { > if (line.toString().contains("##")){ > println "model1" > String > noquotes=line.replace("\"","").replace("\{","").replace("}","") > def (timestamp,host,type,ip)=noquotes.split(",") > String hostname=host.split(":")[1] > String ipdetail=ip.split(":")[1] > String time=timestamp.split(":")[1] > //def label = parts[1] != "" ? "person" : "address" > def v1 = factory.vertex(hostname, "host") > def v2 = factory.vertex(ipdetail) > def edge=factory.edge(v1, v2, "pointto") > edge.properties("timestamp",time) > return v1 > }else{ > println "model2" > String > noquotes=line.replace("\"","").replace("\{","").replace("}","") > def (timestamp,host,type,ip)=noquotes.split(",") > String hostname=host.split(":")[1] > String ipdetail=ip.split(":")[1] > String time=timestamp.split(":")[1] > //def label = parts[1] != "" ? "person" : "address" > def v1 = factory.vertex(ipdetail, "ip") > def v2 = factory.vertex(hostname) > def edge=factory.edge(v2, v1, "pointto") > edge.properties("timestamp",time) > return v1 > } > } > > error stack is > Opened Graph instance: standardjanusgraph[hbase:[10.9.128.12]] > java.util.NoSuchElementException > at > org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204) > at > org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoader.getVertexById(BulkLoader.java:118) > at > org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.lambda$executeInternal$4(BulkLoaderVertexProgram.java:251) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.executeInternal(BulkLoaderVertexProgram.java:249) > at > org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.execute(BulkLoaderVertexProgram.java:197) > at > org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$5(SparkExecutor.java:118) > at > org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247) > at > scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:189) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > this is the gremlin script > graph = > GraphFactory.open("/user/janusgraph-0.2.0-hadoop2/conf/hadoop-graph/hadoop-script.properties") > graph.configuration().setProperty("gremlin.hadoop.scriptInputFormat.script", > "host_ip2.groovy") > graph.configuration().setInputLocation("host_ip.json") > blvp = > BulkLoaderVertexProgram.build().writeGraph("/tmp/10.properties").create(graph) > graph.compute(SparkGraphComputer).workers(1).configure("fs.defaultFS", > "hdfs://am4:8020").program(blvp).submit().get() > > is there any one can help me? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)