[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature

2017-08-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123875#comment-16123875
 ] 

ASF subversion and git services commented on GEODE-3300:


Commit 7072f8ef7b764d1507e26cca8ed0c4d184ccc81a in geode's branch 
refs/heads/develop from [~nreich]
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=7072f8e ]

GEODE-3300: Complete and expose parallel export feature for use

This closes #704


> Complete and expose parallel snapshots feature
> --
>
> Key: GEODE-3300
> URL: https://issues.apache.org/jira/browse/GEODE-3300
> Project: Geode
>  Issue Type: Sub-task
>  Components: docs, snapshot
>Reporter: Nick Reich
>Assignee: Nick Reich
>
> The parallel snapshots feature was never fully completed and exposed in the 
> API for snapshots. This is the first step in allowing users to make use of 
> this feature



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature

2017-08-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123877#comment-16123877
 ] 

ASF GitHub Bot commented on GEODE-3300:
---

Github user asfgit closed the pull request at:

https://github.com/apache/geode/pull/704


> Complete and expose parallel snapshots feature
> --
>
> Key: GEODE-3300
> URL: https://issues.apache.org/jira/browse/GEODE-3300
> Project: Geode
>  Issue Type: Sub-task
>  Components: docs, snapshot
>Reporter: Nick Reich
>Assignee: Nick Reich
>
> The parallel snapshots feature was never fully completed and exposed in the 
> API for snapshots. This is the first step in allowing users to make use of 
> this feature



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature

2017-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122261#comment-16122261
 ] 

ASF GitHub Bot commented on GEODE-3300:
---

Github user nreich commented on the issue:

https://github.com/apache/geode/pull/704
  
The requirement for the ".gfd" extension currently resides in the 
ExportDataCommand from gfsh, the only non-internal entry point to 
create/restore a backup. This constraint could be also validated in 
RegionSnapshotServiceImpl before starting the local snapshot creation.

I can add a toString to SnapshotOptions, some of the contents will still 
return an only somewhat helpful Object.toString() value, but it should provide 
the desired information.


> Complete and expose parallel snapshots feature
> --
>
> Key: GEODE-3300
> URL: https://issues.apache.org/jira/browse/GEODE-3300
> Project: Geode
>  Issue Type: Sub-task
>  Components: docs, snapshot
>Reporter: Nick Reich
>Assignee: Nick Reich
>
> The parallel snapshots feature was never fully completed and exposed in the 
> API for snapshots. This is the first step in allowing users to make use of 
> this feature



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature

2017-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120788#comment-16120788
 ] 

ASF GitHub Bot commented on GEODE-3300:
---

GitHub user nreich opened a pull request:

https://github.com/apache/geode/pull/704

GEODE-3300: Complete and expose parallel export feature for use

This change exposes parallel export of snapshots. It provides a filename 
mapper for parallel exports that gives each snapshot file a unique names based 
on the host it was created on. It also enforces to use of the .gfd extension 
for snapshot files and allows for a directory of snapshot files to be imported 
together. Once this change is merged, additional work is required to update 
gfsh to support the commands necessary to allow users to make use of the 
parallel export feature.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nreich/geode feature/GEODE-3300

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/geode/pull/704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #704


commit 1844570f51fc6e6a1bcbedf5334a629958866426
Author: Nick Reich 
Date:   2017-07-28T23:47:10Z

GEODE-3300: Complete and expose parallel export feature for use




> Complete and expose parallel snapshots feature
> --
>
> Key: GEODE-3300
> URL: https://issues.apache.org/jira/browse/GEODE-3300
> Project: Geode
>  Issue Type: Sub-task
>  Components: docs, snapshot
>Reporter: Nick Reich
>Assignee: Nick Reich
>
> The parallel snapshots feature was never fully completed and exposed in the 
> API for snapshots. This is the first step in allowing users to make use of 
> this feature



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature

2017-08-09 Thread Nick Reich (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120311#comment-16120311
 ] 

Nick Reich commented on GEODE-3300:
---

[~palvarado], parallel export is accomplished by directing each member of the 
cluster to write to disk the data for which it is currently the primary (n.b. 
parallel export only functions for partitioned regions). This results in 
several snapshot files (either spread across the local disks of each member or 
together in a single directory, if a network drive location was specified on 
export). The files can then be imported by having a single member import a 
whole directory of snapshot files or individually commanding each member to 
import file(s) local to itself (you could use this strategy effectively do a 
parallel import manually).

Parallel imports are still being investigated and performance will be tested. 
The problem with parallel imports is that it will work well if the snapshot 
files for each member are local and the members still maintain ownership of the 
same partitions they did when the import was taken. If a rebalance occurred, 
the number of members changed, or the data is being imported into a different 
cluster, there is a high probability that the local data being read by a member 
will not belong to it and need to be sent to a different member, greatly 
increasing network traffic. It is not clear _a priori_ if this would provide 
better performance, but it if does, parallel imports is the next logical step. 

As for bulk loading performance in general, that depends on if you are trying 
to add data to an existing (and populated) cluster or bootstrapping a new one. 
If using persistence, backups provide a much faster mechanism for bootstrapping 
a new cluster (as long as the cluster has an equal or greater number of 
members, though performance is greatly improved when size is the same). For 
adding data to an existing and populated region, putAll() and snapshot imports 
are the best tools available.

> Complete and expose parallel snapshots feature
> --
>
> Key: GEODE-3300
> URL: https://issues.apache.org/jira/browse/GEODE-3300
> Project: Geode
>  Issue Type: Sub-task
>  Components: docs, snapshot
>Reporter: Nick Reich
>Assignee: Nick Reich
>
> The parallel snapshots feature was never fully completed and exposed in the 
> API for snapshots. This is the first step in allowing users to make use of 
> this feature



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3300) Complete and expose parallel snapshots feature

2017-08-01 Thread Pedro Alvarado (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109876#comment-16109876
 ] 

Pedro Alvarado commented on GEODE-3300:
---

[~nreich] Can you briefly elaborate how parallelization will be achieved? Will 
this support parallel import? We happen to be interested in options to improve 
bulk data loading(10M keys @ 132kb values, routinely) and, as we understand it, 
there are only two options putAll() and snapshots imports. I'd be great to know 
if this effort will improve the experience for us or if there are other ideas 
floating around on how to improve bulk loading. 

Thank you for your information and help.

> Complete and expose parallel snapshots feature
> --
>
> Key: GEODE-3300
> URL: https://issues.apache.org/jira/browse/GEODE-3300
> Project: Geode
>  Issue Type: Sub-task
>  Components: docs, snapshot
>Reporter: Nick Reich
>Assignee: Nick Reich
>
> The parallel snapshots feature was never fully completed and exposed in the 
> API for snapshots. This is the first step in allowing users to make use of 
> this feature



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)