[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature
[ https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123875#comment-16123875 ] ASF subversion and git services commented on GEODE-3300: Commit 7072f8ef7b764d1507e26cca8ed0c4d184ccc81a in geode's branch refs/heads/develop from [~nreich] [ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=7072f8e ] GEODE-3300: Complete and expose parallel export feature for use This closes #704 > Complete and expose parallel snapshots feature > -- > > Key: GEODE-3300 > URL: https://issues.apache.org/jira/browse/GEODE-3300 > Project: Geode > Issue Type: Sub-task > Components: docs, snapshot >Reporter: Nick Reich >Assignee: Nick Reich > > The parallel snapshots feature was never fully completed and exposed in the > API for snapshots. This is the first step in allowing users to make use of > this feature -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature
[ https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123877#comment-16123877 ] ASF GitHub Bot commented on GEODE-3300: --- Github user asfgit closed the pull request at: https://github.com/apache/geode/pull/704 > Complete and expose parallel snapshots feature > -- > > Key: GEODE-3300 > URL: https://issues.apache.org/jira/browse/GEODE-3300 > Project: Geode > Issue Type: Sub-task > Components: docs, snapshot >Reporter: Nick Reich >Assignee: Nick Reich > > The parallel snapshots feature was never fully completed and exposed in the > API for snapshots. This is the first step in allowing users to make use of > this feature -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature
[ https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122261#comment-16122261 ] ASF GitHub Bot commented on GEODE-3300: --- Github user nreich commented on the issue: https://github.com/apache/geode/pull/704 The requirement for the ".gfd" extension currently resides in the ExportDataCommand from gfsh, the only non-internal entry point to create/restore a backup. This constraint could be also validated in RegionSnapshotServiceImpl before starting the local snapshot creation. I can add a toString to SnapshotOptions, some of the contents will still return an only somewhat helpful Object.toString() value, but it should provide the desired information. > Complete and expose parallel snapshots feature > -- > > Key: GEODE-3300 > URL: https://issues.apache.org/jira/browse/GEODE-3300 > Project: Geode > Issue Type: Sub-task > Components: docs, snapshot >Reporter: Nick Reich >Assignee: Nick Reich > > The parallel snapshots feature was never fully completed and exposed in the > API for snapshots. This is the first step in allowing users to make use of > this feature -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature
[ https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120788#comment-16120788 ] ASF GitHub Bot commented on GEODE-3300: --- GitHub user nreich opened a pull request: https://github.com/apache/geode/pull/704 GEODE-3300: Complete and expose parallel export feature for use This change exposes parallel export of snapshots. It provides a filename mapper for parallel exports that gives each snapshot file a unique names based on the host it was created on. It also enforces to use of the .gfd extension for snapshot files and allows for a directory of snapshot files to be imported together. Once this change is merged, additional work is required to update gfsh to support the commands necessary to allow users to make use of the parallel export feature. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nreich/geode feature/GEODE-3300 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/geode/pull/704.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #704 commit 1844570f51fc6e6a1bcbedf5334a629958866426 Author: Nick Reich Date: 2017-07-28T23:47:10Z GEODE-3300: Complete and expose parallel export feature for use > Complete and expose parallel snapshots feature > -- > > Key: GEODE-3300 > URL: https://issues.apache.org/jira/browse/GEODE-3300 > Project: Geode > Issue Type: Sub-task > Components: docs, snapshot >Reporter: Nick Reich >Assignee: Nick Reich > > The parallel snapshots feature was never fully completed and exposed in the > API for snapshots. This is the first step in allowing users to make use of > this feature -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature
[ https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120311#comment-16120311 ] Nick Reich commented on GEODE-3300: --- [~palvarado], parallel export is accomplished by directing each member of the cluster to write to disk the data for which it is currently the primary (n.b. parallel export only functions for partitioned regions). This results in several snapshot files (either spread across the local disks of each member or together in a single directory, if a network drive location was specified on export). The files can then be imported by having a single member import a whole directory of snapshot files or individually commanding each member to import file(s) local to itself (you could use this strategy effectively do a parallel import manually). Parallel imports are still being investigated and performance will be tested. The problem with parallel imports is that it will work well if the snapshot files for each member are local and the members still maintain ownership of the same partitions they did when the import was taken. If a rebalance occurred, the number of members changed, or the data is being imported into a different cluster, there is a high probability that the local data being read by a member will not belong to it and need to be sent to a different member, greatly increasing network traffic. It is not clear _a priori_ if this would provide better performance, but it if does, parallel imports is the next logical step. As for bulk loading performance in general, that depends on if you are trying to add data to an existing (and populated) cluster or bootstrapping a new one. If using persistence, backups provide a much faster mechanism for bootstrapping a new cluster (as long as the cluster has an equal or greater number of members, though performance is greatly improved when size is the same). For adding data to an existing and populated region, putAll() and snapshot imports are the best tools available. > Complete and expose parallel snapshots feature > -- > > Key: GEODE-3300 > URL: https://issues.apache.org/jira/browse/GEODE-3300 > Project: Geode > Issue Type: Sub-task > Components: docs, snapshot >Reporter: Nick Reich >Assignee: Nick Reich > > The parallel snapshots feature was never fully completed and exposed in the > API for snapshots. This is the first step in allowing users to make use of > this feature -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature
[ https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109876#comment-16109876 ] Pedro Alvarado commented on GEODE-3300: --- [~nreich] Can you briefly elaborate how parallelization will be achieved? Will this support parallel import? We happen to be interested in options to improve bulk data loading(10M keys @ 132kb values, routinely) and, as we understand it, there are only two options putAll() and snapshots imports. I'd be great to know if this effort will improve the experience for us or if there are other ideas floating around on how to improve bulk loading. Thank you for your information and help. > Complete and expose parallel snapshots feature > -- > > Key: GEODE-3300 > URL: https://issues.apache.org/jira/browse/GEODE-3300 > Project: Geode > Issue Type: Sub-task > Components: docs, snapshot >Reporter: Nick Reich >Assignee: Nick Reich > > The parallel snapshots feature was never fully completed and exposed in the > API for snapshots. This is the first step in allowing users to make use of > this feature -- This message was sent by Atlassian JIRA (v6.4.14#64029)