Re: Adding parallel import/export of snapshots to gfsh
>> One other idea that hasn't been mentioned is making parallel the only way My vote is to support both option; we could make parallel default but having an option to take snapshot at one node may be useful for use-cases where: - Easier to manage snapshot at one file location; in a large cluster environment. - Control on file level security - A node can be used for snapshot without impacting other nodes (disk i/o ops). -Anil. On Tue, Aug 22, 2017 at 3:42 PM, Nick Reichwrote: > With minimal code change, it is possible to enable the use of —dir for both > standard and parallel export/import, allowing —file to function only for > standard exports (and optionally, make it depricated in favor of the —dir > option). > > While not inherently opposed to forcing all Partitioned Region snapshots to > be parallel, that seems to be a significant enough change to current > functionality (one file one one member to multiple files on multiple > members), I am hesitant to do so without united community backing for that > approach. > > On Tue, Aug 22, 2017 at 2:24 PM, Michael Stolz wrote: > > > One other idea that hasn't been mentioned is making parallel the only way > > for Partitioned Regions, and having --file serve the purpose of defining > > both a path and a filename pattern where the bucket ID or whatever we're > > using gets automatically inserted before the .gfd extension. > > > > No need for a new option (--parallel). > > No need for a new option (--path). > > > > In fact, no need for a change to gfsh command at all. > > > > > > -- > > Mike Stolz > > Principal Engineer, GemFire Product Manager > > Mobile: +1-631-835-4771 > > > > On Tue, Aug 22, 2017 at 2:15 PM, Nick Reich wrote: > > > > > Parallel export will write the data to files on the bucket primary for > > each > > > bucket, distributing the work (and therefore files) to all the members. > > > That would be a big enough deviation from the current behavior (single > > file > > > on single machine), that I think it makes it worth having the > additional > > > options (but I agree: less options is generally better). > > > > > > On Tue, Aug 22, 2017 at 1:59 PM, Jacob Barrett > > > wrote: > > > > > > > On Tue, Aug 22, 2017 at 1:49 PM Nick Reich > wrote: > > > > > > > > > The idea of deprecating —file in favor of path is interesting. I > > wonder > > > > if > > > > > instead of making them mutually exclusive to start, having —path be > > > able > > > > to > > > > > support both modes from the start would be better? That way —file > > could > > > > > still be used for the existing mode, but —path could be used > instead > > > (and > > > > > override —file is both given?): that would provide a clear path > > forward > > > > for > > > > > how the command should be used, while fully supporting existing > > > > workflows. > > > > > > > > > > > > > This is what I meant by deprecating. Maybe even providing a message > > that > > > if > > > > --file is set that it is deprecated for --path. > > > > > > > > > > > > > We need to continue to support both modes, as only Partitioned > > Regions > > > > can > > > > > make use of parallel export (it is parallelized on a bucket level). > > > > > > > > > > > > > Ok, so why not just make parallel the only mode for partitioned. Then > > you > > > > remove the need for --parallel and --path would work for any region, > > > > non-partitioned would create a single file at that path and > partitioned > > > > would create several? I am all for less options. ;) > > > > > > > > -Jake > > > > > > > > > >
Re: Adding parallel import/export of snapshots to gfsh
With minimal code change, it is possible to enable the use of —dir for both standard and parallel export/import, allowing —file to function only for standard exports (and optionally, make it depricated in favor of the —dir option). While not inherently opposed to forcing all Partitioned Region snapshots to be parallel, that seems to be a significant enough change to current functionality (one file one one member to multiple files on multiple members), I am hesitant to do so without united community backing for that approach. On Tue, Aug 22, 2017 at 2:24 PM, Michael Stolzwrote: > One other idea that hasn't been mentioned is making parallel the only way > for Partitioned Regions, and having --file serve the purpose of defining > both a path and a filename pattern where the bucket ID or whatever we're > using gets automatically inserted before the .gfd extension. > > No need for a new option (--parallel). > No need for a new option (--path). > > In fact, no need for a change to gfsh command at all. > > > -- > Mike Stolz > Principal Engineer, GemFire Product Manager > Mobile: +1-631-835-4771 > > On Tue, Aug 22, 2017 at 2:15 PM, Nick Reich wrote: > > > Parallel export will write the data to files on the bucket primary for > each > > bucket, distributing the work (and therefore files) to all the members. > > That would be a big enough deviation from the current behavior (single > file > > on single machine), that I think it makes it worth having the additional > > options (but I agree: less options is generally better). > > > > On Tue, Aug 22, 2017 at 1:59 PM, Jacob Barrett > > wrote: > > > > > On Tue, Aug 22, 2017 at 1:49 PM Nick Reich wrote: > > > > > > > The idea of deprecating —file in favor of path is interesting. I > wonder > > > if > > > > instead of making them mutually exclusive to start, having —path be > > able > > > to > > > > support both modes from the start would be better? That way —file > could > > > > still be used for the existing mode, but —path could be used instead > > (and > > > > override —file is both given?): that would provide a clear path > forward > > > for > > > > how the command should be used, while fully supporting existing > > > workflows. > > > > > > > > > > This is what I meant by deprecating. Maybe even providing a message > that > > if > > > --file is set that it is deprecated for --path. > > > > > > > > > > We need to continue to support both modes, as only Partitioned > Regions > > > can > > > > make use of parallel export (it is parallelized on a bucket level). > > > > > > > > > > Ok, so why not just make parallel the only mode for partitioned. Then > you > > > remove the need for --parallel and --path would work for any region, > > > non-partitioned would create a single file at that path and partitioned > > > would create several? I am all for less options. ;) > > > > > > -Jake > > > > > >
Re: Adding parallel import/export of snapshots to gfsh
One other idea that hasn't been mentioned is making parallel the only way for Partitioned Regions, and having --file serve the purpose of defining both a path and a filename pattern where the bucket ID or whatever we're using gets automatically inserted before the .gfd extension. No need for a new option (--parallel). No need for a new option (--path). In fact, no need for a change to gfsh command at all. -- Mike Stolz Principal Engineer, GemFire Product Manager Mobile: +1-631-835-4771 On Tue, Aug 22, 2017 at 2:15 PM, Nick Reichwrote: > Parallel export will write the data to files on the bucket primary for each > bucket, distributing the work (and therefore files) to all the members. > That would be a big enough deviation from the current behavior (single file > on single machine), that I think it makes it worth having the additional > options (but I agree: less options is generally better). > > On Tue, Aug 22, 2017 at 1:59 PM, Jacob Barrett > wrote: > > > On Tue, Aug 22, 2017 at 1:49 PM Nick Reich wrote: > > > > > The idea of deprecating —file in favor of path is interesting. I wonder > > if > > > instead of making them mutually exclusive to start, having —path be > able > > to > > > support both modes from the start would be better? That way —file could > > > still be used for the existing mode, but —path could be used instead > (and > > > override —file is both given?): that would provide a clear path forward > > for > > > how the command should be used, while fully supporting existing > > workflows. > > > > > > > This is what I meant by deprecating. Maybe even providing a message that > if > > --file is set that it is deprecated for --path. > > > > > > > We need to continue to support both modes, as only Partitioned Regions > > can > > > make use of parallel export (it is parallelized on a bucket level). > > > > > > > Ok, so why not just make parallel the only mode for partitioned. Then you > > remove the need for --parallel and --path would work for any region, > > non-partitioned would create a single file at that path and partitioned > > would create several? I am all for less options. ;) > > > > -Jake > > >
Re: Adding parallel import/export of snapshots to gfsh
Parallel export will write the data to files on the bucket primary for each bucket, distributing the work (and therefore files) to all the members. That would be a big enough deviation from the current behavior (single file on single machine), that I think it makes it worth having the additional options (but I agree: less options is generally better). On Tue, Aug 22, 2017 at 1:59 PM, Jacob Barrettwrote: > On Tue, Aug 22, 2017 at 1:49 PM Nick Reich wrote: > > > The idea of deprecating —file in favor of path is interesting. I wonder > if > > instead of making them mutually exclusive to start, having —path be able > to > > support both modes from the start would be better? That way —file could > > still be used for the existing mode, but —path could be used instead (and > > override —file is both given?): that would provide a clear path forward > for > > how the command should be used, while fully supporting existing > workflows. > > > > This is what I meant by deprecating. Maybe even providing a message that if > --file is set that it is deprecated for --path. > > > > We need to continue to support both modes, as only Partitioned Regions > can > > make use of parallel export (it is parallelized on a bucket level). > > > > Ok, so why not just make parallel the only mode for partitioned. Then you > remove the need for --parallel and --path would work for any region, > non-partitioned would create a single file at that path and partitioned > would create several? I am all for less options. ;) > > -Jake >
Re: Adding parallel import/export of snapshots to gfsh
I don't really like the idea of adding a separate command. It really is the same command - you just want to have the parallel flag interact with other options. A separate command would be more confusing for users, and more of a maintenance issue as we add more options to export. Having a --path that could be a file or directory depending on whether your export is parallel or serial also seems unintuitive. Kirk's idea of mutually exclusive options seems more reasonable. Or better yet, just add --dir and make it work the same way for both serial and a parallel exports - we generate a files and put them in that directory. -Dan On Tue, Aug 22, 2017 at 1:59 PM, Jacob Barrettwrote: > On Tue, Aug 22, 2017 at 1:49 PM Nick Reich wrote: > > > The idea of deprecating —file in favor of path is interesting. I wonder > if > > instead of making them mutually exclusive to start, having —path be able > to > > support both modes from the start would be better? That way —file could > > still be used for the existing mode, but —path could be used instead (and > > override —file is both given?): that would provide a clear path forward > for > > how the command should be used, while fully supporting existing > workflows. > > > > This is what I meant by deprecating. Maybe even providing a message that if > --file is set that it is deprecated for --path. > > > > We need to continue to support both modes, as only Partitioned Regions > can > > make use of parallel export (it is parallelized on a bucket level). > > > > Ok, so why not just make parallel the only mode for partitioned. Then you > remove the need for --parallel and --path would work for any region, > non-partitioned would create a single file at that path and partitioned > would create several? I am all for less options. ;) > > -Jake >
Re: Adding parallel import/export of snapshots to gfsh
On Tue, Aug 22, 2017 at 1:49 PM Nick Reichwrote: > The idea of deprecating —file in favor of path is interesting. I wonder if > instead of making them mutually exclusive to start, having —path be able to > support both modes from the start would be better? That way —file could > still be used for the existing mode, but —path could be used instead (and > override —file is both given?): that would provide a clear path forward for > how the command should be used, while fully supporting existing workflows. > This is what I meant by deprecating. Maybe even providing a message that if --file is set that it is deprecated for --path. > We need to continue to support both modes, as only Partitioned Regions can > make use of parallel export (it is parallelized on a bucket level). > Ok, so why not just make parallel the only mode for partitioned. Then you remove the need for --parallel and --path would work for any region, non-partitioned would create a single file at that path and partitioned would create several? I am all for less options. ;) -Jake
Re: Adding parallel import/export of snapshots to gfsh
I thought about a mutually exclusive —file and —dir, but in that case, -—file is required for standard and —path required for parallel export, which I think could be better than overloading —file, but still has potential for confusion. The idea of deprecating —file in favor of path is interesting. I wonder if instead of making them mutually exclusive to start, having —path be able to support both modes from the start would be better? That way —file could still be used for the existing mode, but —path could be used instead (and override —file is both given?): that would provide a clear path forward for how the command should be used, while fully supporting existing workflows. We need to continue to support both modes, as only Partitioned Regions can make use of parallel export (it is parallelized on a bucket level). On Tue, Aug 22, 2017 at 12:55 PM, Jacob Barrettwrote: > How about deprecate —file and replace with —path? In the transition make > them mutually exclusive and —path required for —parallel. > > Any reason to not just make all export parallel rather than supporting two > different modes? > > -Jake > > > Sent from my iPhone > > > On Aug 22, 2017, at 12:27 PM, Kenneth Howe wrote: > > > > I agrees that overloading the “file” option seems like a bad idea. As an > alternative to separate commands, what about mutually exclusive options, > ‘—file’ and ‘—dir’? > > > > If you go for implementing the new functionality as a separate command, > I would suggest calling the gfsh commands: “export data-parallel” and > “import data-parallel" > > > >> On Aug 22, 2017, at 11:32 AM, Nick Reich wrote: > >> > >> Team, > >> > >> I am working on exposing the parallel export/import of snapshots through > >> gfsh and would appreciate input on the best approach to adding to / > >> updating the existing interface. > >> > >> Currently, ExportDataCommand and ImportDataCommand take a region name, a > >> member to run the command on, and a file location (that must end in > .gfd). > >> Parallel import and export require a directory location instead of a > single > >> file name (as there can be multiple files and need for uniquely named > >> files). It is possible to add a parallel flag and have the meaning of > the > >> "file" parameter be different depending on that flag, but that seems > overly > >> confusing to me. I am instead leaning towards creating new commands > (e.g. > >> ParallelExportDataCommand) that has a "directory" parameter to replace > >> "file", but is otherwise identical in usage to the existing commands. > >> > >> Do others have different views or approaches to suggest? > > >
Re: Adding parallel import/export of snapshots to gfsh
How about deprecate —file and replace with —path? In the transition make them mutually exclusive and —path required for —parallel. Any reason to not just make all export parallel rather than supporting two different modes? -Jake Sent from my iPhone > On Aug 22, 2017, at 12:27 PM, Kenneth Howewrote: > > I agrees that overloading the “file” option seems like a bad idea. As an > alternative to separate commands, what about mutually exclusive options, > ‘—file’ and ‘—dir’? > > If you go for implementing the new functionality as a separate command, I > would suggest calling the gfsh commands: “export data-parallel” and “import > data-parallel" > >> On Aug 22, 2017, at 11:32 AM, Nick Reich wrote: >> >> Team, >> >> I am working on exposing the parallel export/import of snapshots through >> gfsh and would appreciate input on the best approach to adding to / >> updating the existing interface. >> >> Currently, ExportDataCommand and ImportDataCommand take a region name, a >> member to run the command on, and a file location (that must end in .gfd). >> Parallel import and export require a directory location instead of a single >> file name (as there can be multiple files and need for uniquely named >> files). It is possible to add a parallel flag and have the meaning of the >> "file" parameter be different depending on that flag, but that seems overly >> confusing to me. I am instead leaning towards creating new commands (e.g. >> ParallelExportDataCommand) that has a "directory" parameter to replace >> "file", but is otherwise identical in usage to the existing commands. >> >> Do others have different views or approaches to suggest? >
Re: Adding parallel import/export of snapshots to gfsh
I agrees that overloading the “file” option seems like a bad idea. As an alternative to separate commands, what about mutually exclusive options, ‘—file’ and ‘—dir’? If you go for implementing the new functionality as a separate command, I would suggest calling the gfsh commands: “export data-parallel” and “import data-parallel" > On Aug 22, 2017, at 11:32 AM, Nick Reichwrote: > > Team, > > I am working on exposing the parallel export/import of snapshots through > gfsh and would appreciate input on the best approach to adding to / > updating the existing interface. > > Currently, ExportDataCommand and ImportDataCommand take a region name, a > member to run the command on, and a file location (that must end in .gfd). > Parallel import and export require a directory location instead of a single > file name (as there can be multiple files and need for uniquely named > files). It is possible to add a parallel flag and have the meaning of the > "file" parameter be different depending on that flag, but that seems overly > confusing to me. I am instead leaning towards creating new commands (e.g. > ParallelExportDataCommand) that has a "directory" parameter to replace > "file", but is otherwise identical in usage to the existing commands. > > Do others have different views or approaches to suggest?