Dylan Bethune-Waddell created TINKERPOP-1315:
------------------------------------------------

             Summary: HadoopConfiguration will not allow an ArrayList to be 
serialized in vertexProgram configuration unless setProperty is overriden
                 Key: TINKERPOP-1315
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1315
             Project: TinkerPop
          Issue Type: Improvement
          Components: hadoop
    Affects Versions: 3.2.1
            Reporter: Dylan Bethune-Waddell
            Priority: Minor


I have been implementing a "PrecisionBulkLoader" class that takes a 
ScriptTraversal with bindings that can execute against the target graph to 
getOrCreate vertices/edges with more precision - this follows from my 
realization that currently IncrementalBulkLoader will overwrite the first edge 
of the same label in the target graph that is between the two vertex endpoints 
- this is an issue for self-loops and multi-edges:

https://issues.apache.org/jira/browse/TINKERPOP-1099

I finally got it to work with the script bindings being propagated to workers, 
but in order to do so without just taking the last value of the Array I had to 
override the setProperty method in 
org.apache.tinkerpop.gremlin.hadoop.structure.HadoopConfiguration - before I 
did that, when ConfigurationUtils.copy(conf1, conf2) was called with a 
HadoopConfiguration on either end (conf1 or conf2), any multi-valued / list 
properties get clobbered and only the last value would be there after 
storeState/loadState goes through the first cycle in BulkLoaderVertexProgram. 
This is something that was bugging me for a while with multiple hosts 
configured for TitanGraph in the config and the HadoopConf only opening a 
connection against the last host in the list - this change to 
HadoopConfiguration causes it to read  
standardtitangraph[cassandrathrift:[host1, host2, ...]] in the spark executor 
logs instead like you might expect, and allows the bindings for the 
ScriptTraversal to survive storeState/loadState and be applied to the traversal.

I suppose I was wondering if this is dangerous or bad somehow? I know that in a 
few places I saw the values of the configuration being explicitly 
toString()'d...




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to