[
https://issues.apache.org/jira/browse/TINKERPOP-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15307762#comment-15307762
]
Marko A. Rodriguez commented on TINKERPOP-1315:
-----------------------------------------------
This is not a function of {{HadoopConfiguration}}, but Apache Configuration.
Collections are automatically turned into {{Configuration}} arrays. This has
been a thorn in my side many times, but we can't just change
{{HadoopConfiguration}} to override Apache Configurations expected behavior due
to all the other uses of Apache Configuration (like reading/writing from/to a
properties file, etc.).
> HadoopConfiguration will not allow an ArrayList to be serialized in
> vertexProgram configuration unless setProperty is overriden
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: TINKERPOP-1315
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1315
> Project: TinkerPop
> Issue Type: Improvement
> Components: hadoop
> Affects Versions: 3.2.1
> Reporter: Dylan Bethune-Waddell
> Priority: Minor
>
> I have been implementing a "PrecisionBulkLoader" class that takes a
> ScriptTraversal with bindings that can execute against the target graph to
> getOrCreate vertices/edges with more precision - this follows from my
> realization that currently IncrementalBulkLoader will overwrite the first
> edge of the same label in the target graph that is between the two vertex
> endpoints - this is an issue for self-loops and multi-edges:
> https://issues.apache.org/jira/browse/TINKERPOP-1099
> I finally got it to work with the script bindings being propagated to
> workers, but in order to do so without just taking the last value of the
> Array I had to override the setProperty method in
> org.apache.tinkerpop.gremlin.hadoop.structure.HadoopConfiguration - before I
> did that, when ConfigurationUtils.copy(conf1, conf2) was called with a
> HadoopConfiguration on either end (conf1 or conf2), any multi-valued / list
> properties get clobbered and only the last value would be there after
> storeState/loadState goes through the first cycle in BulkLoaderVertexProgram.
> This is something that was bugging me for a while with multiple hosts
> configured for TitanGraph in the config and the HadoopConf only opening a
> connection against the last host in the list - this change to
> HadoopConfiguration causes it to read
> standardtitangraph[cassandrathrift:[host1, host2, ...]] in the spark executor
> logs instead like you might expect, and allows the bindings for the
> ScriptTraversal to survive storeState/loadState and be applied to the
> traversal.
> I suppose I was wondering if this is dangerous or bad somehow? I know that in
> a few places I saw the values of the configuration being explicitly
> toString()'d...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)