[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-07-03 Thread Jay Sen (JIRA)


[ 
https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878203#comment-16878203
 ] 

Jay Sen commented on GOBBLIN-707:
-

[~ibuenros], pls take a look.

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run  -> gobblin cli run 
> # class: JobStateToJsonConverter
> statestore-checker.sh  -> gobblin statestore-checker 
> # class: StateStoreCleaner
> statestore-clean.sh  -> gobblin statestore-clean 
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh  -> gobblin historystore-manager 
> # class: Cli
> gobblin-admin.sh-> gobblin admin 
> # all gobblin deployment modes
> gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh   -> gobblin cluster-mater start|stop|status
> gobblin-env.sh  -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status
> gobblin-service.sh  -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh   -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh -> gobblin cluster-mater start|stop|status
> {code}
>  
> 2. Also configs needs to be structured and deduped accordingly to make it 
> clear on which config will be picked up for which execution mode.
>  {color:#ff}
>  NOTE: this refactoring adds all cli and service commands to gobblin.sh and 
> hence changes the syntax for all commands and services.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-18 Thread Jay Sen (JIRA)


[ 
https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843281#comment-16843281
 ] 

Jay Sen commented on GOBBLIN-707:
-

updated to the following: 
{code}

➜  gobblin-dist ./bin/gobblin
Usage:
gobblin.sh  cli 
gobblin.sh  service  

Use "gobblin  --help" for more information. (Gobblin 
Version: 0.15.0)

➜  gobblin-dist ./bin/gobblin cli
Usage:
gobblin.sh  cli 

options:
cli-commands:
passwordManager Encrypt or decrypt strings for the 
password manager.
decrypt Decryption utilities
run Run a Gobblin application.
config  Query the config library
jobsCommand line job info and operations
stateMigration  Command line tools for migrating 
state store
job-state-to-json   To convert Job state to JSON
cleaner Data retention utility
keystoreExamine JCE Keystore files
watermarks  Inspect streaming watermarks
job-store-schema-managerDatabase job history store schema 
manager

--conf-dir   Gobblon config path. default is 
'$GOBBLIN_HOME/conf/'.
--log4j-conf   default is 
'$GOBBLIN_HOME/conf//log4j.properties'.
--jvmopts   String containing JVM flags to include, 
in addition to "-Xmx1g -Xms512m".
--jars Column-separated list of extra jars to 
put on the CLASSPATH.
--enable-gc-logs   enables gc logs & dumps.
--show-classpath   prints gobblin runtime classpath.
--help Display this help.
--verbose  Display full command used to start the 
process.
   Gobblin Version: 0.15.0

➜  gobblin-dist ./bin/gobblin service
Usage:
gobblin.sh  service  

Argument Options:
   standalone, cluster-master, 
cluster-worker, aws,
 yarn, mapreduce, 
service-manager.

--conf-dir   Gobblon config path. default is 
'$GOBBLIN_HOME/conf/'.
--log4j-conf   default is 
'$GOBBLIN_HOME/conf//log4j.properties'.
--jvmopts   String containing JVM flags to include, 
in addition to "-Xmx1g -Xms512m".
--jars Column-separated list of extra jars to 
put on the CLASSPATH.
--enable-gc-logs   enables gc logs & dumps.
--show-classpath   prints gobblin runtime classpath.
--cluster-name Name of the cluster to be used by helix 
& other services. ( default: gobblin_cluster).
--jt Only for mapreduce mode: Job submission 
URL, if not set, taken from ${HADOOP_HOME}/conf.
--fs  Only for mapreduce mode: Target file 
system, if not set, taken from ${HADOOP_HOME}/conf.
--help Display this help.
--verbose  Display full command used to start the 
process.
   Gobblin Version: 0.15.0
{code}

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployme

[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-15 Thread Jay Sen (JIRA)


[ 
https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840878#comment-16840878
 ] 

Jay Sen commented on GOBBLIN-707:
-

Hi [~ibuenros], can you pls take a look, and I will push the commit. Thanks. If 
it helps, Can we meet online and figure this out to expedite ?

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run  -> gobblin cli run 
> # class: JobStateToJsonConverter
> statestore-checker.sh  -> gobblin statestore-checker 
> # class: StateStoreCleaner
> statestore-clean.sh  -> gobblin statestore-clean 
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh  -> gobblin historystore-manager 
> # class: Cli
> gobblin-admin.sh-> gobblin admin 
> # all gobblin deployment modes
> gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh   -> gobblin cluster-mater start|stop|status
> gobblin-env.sh  -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status
> gobblin-service.sh  -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh   -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh -> gobblin cluster-mater start|stop|status
> {code}
>  
> 2. Also configs needs to be structured and deduped accordingly to make it 
> clear on which config will be picked up for which execution mode.
>  {color:#ff}
>  NOTE: this refactoring adds all cli and service commands to gobblin.sh and 
> hence changes the syntax for all commands and services.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-09 Thread Jay Sen (JIRA)


[ 
https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836792#comment-16836792
 ] 

Jay Sen commented on GOBBLIN-707:
-

Hi [~ibuenros], any comment here?

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run  -> gobblin cli run 
> # class: JobStateToJsonConverter
> statestore-checker.sh  -> gobblin statestore-checker 
> # class: StateStoreCleaner
> statestore-clean.sh  -> gobblin statestore-clean 
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh  -> gobblin historystore-manager 
> # class: Cli
> gobblin-admin.sh-> gobblin admin 
> # all gobblin deployment modes
> gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh   -> gobblin cluster-mater start|stop|status
> gobblin-env.sh  -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status
> gobblin-service.sh  -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh   -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh -> gobblin cluster-mater start|stop|status
> {code}
>  
> 2. Also configs needs to be structured and deduped accordingly to make it 
> clear on which config will be picked up for which execution mode.
>  {color:#ff}
>  NOTE: this refactoring adds all cli and service commands to gobblin.sh and 
> hence changes the syntax for all commands and services.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-03 Thread Jay Sen (JIRA)


[ 
https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832968#comment-16832968
 ] 

Jay Sen commented on GOBBLIN-707:
-

with everything in gobblin.sh, I have added following ways, please let me know 
your comment.
{code}
Unable to find source-code formatter for language: bash. Available languages 
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, 
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml➜  
gobblin-dist ./bin/gobblin
gobblin.sh  cli 
gobblin.sh  service  

Use "gobblin  --help" for more information
➜  gobblin-dist ./bin/gobblin cli
gobblin.sh  cli 

options:
cli-commands:  admin, jobs, statestore-check, 
statestore-clean, historystore-manager

--conf-dir   Gobblon config path. default is 
'$GOBBLIN_HOME/conf/'.
--log4j-conf   default is 
'$GOBBLIN_HOME/conf//log4j.properties'.
--jvmopts   String containing JVM flags to include, 
in addition to "-Xmx1g -Xms512m".
--jars Column-separated list of extra jars to 
put on the CLASSPATH.
--enable-gc-logs   enables gc logs & dumps.
--show-classpath   prints gobblin runtime classpath.
--help Display this help.
--verbose  Display full command used to start the 
process.
   Gobblin Version: 0.15.0
➜  gobblin-dist ./bin/gobblin service
gobblin.sh  service  

Argument Options:
   standalone, cluster-master, 
cluster-worker, aws, yarn, mapreduce, service-manager.

--cluster-name Name of the cluster to be used by helix 
& other services. ( default: gobblin_cluster).
--conf-dir   Gobblon config path. default is 
'$GOBBLIN_HOME/conf/'.
--log4j-conf   default is 
'$GOBBLIN_HOME/conf//log4j.properties'.
--jvmopts   String containing JVM flags to include, 
in addition to "-Xmx1g -Xms512m".
--jars Column-separated list of extra jars to 
put on the CLASSPATH.
--enable-gc-logs   enables gc logs & dumps.
--show-classpath   prints gobblin runtime classpath.
--jt Only for mapreduce mode: Job submission 
URL, if not set, taken from ${HADOOP_HOME}/conf.
--fs  Only for mapreduce mode: Target file 
system, if not set, taken from ${HADOOP_HOME}/conf.
--help Display this help.
--verbose  Display full command used to start the 
process.
   Gobblin Version: 0.15.0
{code}

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps

[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-03 Thread Jay Sen (JIRA)


[ 
https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832957#comment-16832957
 ] 

Jay Sen commented on GOBBLIN-707:
-

+ comment from the git, for more clarity on what you are suggesting.
{quote}Can we leave {{gobblin.sh}} relatively simple and instead have 
{{gobblin-cli.sh}} and {{gobblin-service.sh}}? {{gobblin.sh}} would just 
redirect to the correct place depending on the first argument
{quote}
This could also be done, but it would add duplicate the code for handling 
options (conf, jvmopts, etc...) and classpath building.

Basically, pretty much all options of gobblin-cli scripts is duplicated from 
gobblin-services ( which needs all options) as shown below:
gobblin cli --help

gobblin cli   
cli-commands : admin, jobs, statestore-check, statestore-clean, 
historystore-manager
params : respective parameters for the commands
other_options: 
--conf-dir  Gobblon config path. default is 
'$GOBBLIN_HOME/conf/'. 
--jvmopts  String containing JVM flags to include, in 
addition to "-Xmx1g -Xms512m". 
--jars  Column-separated list of extra jars to put on 
the CLASSPATH. 
--enable-gc-logs enables gc logs & dumps. 
--show-classpath prints gobblin runtime classpath. 
--help Display this help. 
--verbose Display full command used to start the process.

 
gobblin services --help
gobblin service   

execution-modes : standalone, cluster-master, cluster-worker, aws, yarn, 
mapreduce, service-manager.
other_options:
--cluster-name Name of the cluster to be used by helix & other services. ( 
default: gobblin_cluster). 
--conf-dir  Gobblon config path. default is 
'$GOBBLIN_HOME/conf/'. 
--log4j-conf  default is 
'$GOBBLIN_HOME/conf//log4j.properties'. 
--jvmopts  String containing JVM flags to include, in 
addition to "-Xmx1g -Xms512m". 
--jars  Column-separated list of extra jars to put on 
the CLASSPATH. 
--enable-gc-logs enables gc logs & dumps. --show-classpath prints gobblin 
runtime classpath. 
--jt  Only for mapreduce mode: Job submission URL, if not 
set, taken from ${HADOOP_HOME}/conf. 
--fs  Only for mapreduce mode: Target file system, if not set, 
taken from ${HADOOP_HOME}/conf. 
--help Display this help. 
--verbose Display full command used to start the process.

 

If we keep all the code common to handle options and other things then that is 
pretty much what I have done in gobblin.sh,

may be i can just separate out the help message for cli and services so it will 
be more clear abut options for each and aligns with what you are suggesting and 
then later on i can also try to bring in java classes under GobblinCli as a 
separate PR otherwise this PR will keep growing... :)

Let me know if you think otherwise, and I will think about how to make that 
change. 

 

Thanks

Jay

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobbli

[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-03 Thread Issac Buenrostro (JIRA)


[ 
https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832912#comment-16832912
 ] 

Issac Buenrostro commented on GOBBLIN-707:
--

I see, didn't realize there was so much added to `gobblin.cli`.

Can we do this to avoid confusing what options apply to each mode?
{code:java}
gobblin --help
gobblin cli  
gobblin service  

Use "gobblin  --help" for more information {code}

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run  -> gobblin cli run 
> # class: JobStateToJsonConverter
> statestore-checker.sh  -> gobblin statestore-checker 
> # class: StateStoreCleaner
> statestore-clean.sh  -> gobblin statestore-clean 
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh  -> gobblin historystore-manager 
> # class: Cli
> gobblin-admin.sh-> gobblin admin 
> # all gobblin deployment modes
> gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh   -> gobblin cluster-mater start|stop|status
> gobblin-env.sh  -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status
> gobblin-service.sh  -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh   -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh -> gobblin cluster-mater start|stop|status
> {code}
>  
> 2. Also configs needs to be structured and deduped accordingly to make it 
> clear on which config will be picked up for which execution mode.
>  {color:#ff}
>  NOTE: this refactoring adds all cli and service commands to gobblin.sh and 
> hence changes the syntax for all commands and services.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-02 Thread Jay Sen (JIRA)


[ 
https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832130#comment-16832130
 ] 

Jay Sen commented on GOBBLIN-707:
-

Sure, that would be even better but that will required even further refactoring 
of the Java classes for statestore-checker and others to bring them under the 
{{Alias}} and make it go through {{GobblinCli}}, I will do that once you 
confirm the following syntax in that case: 
{code}
gobblin --help
gobblin.sh cli  
gobblin.sh service  

Argument Options:
 admin, jobs, statestore-check, statestore-clean, 
historystore-manager
 standalone, cluster-master, cluster-worker, aws, yarn, 
mapreduce, service-manager.

--cluster-name Name of the cluster to be used by helix & other services. ( 
default: gobblin_cluster).
--conf-dir  Gobblon config path. default is 
'$GOBBLIN_HOME/conf/'.
--log4j-conf  default is 
'$GOBBLIN_HOME/conf//log4j.properties'.
--jvmopts  String containing JVM flags to include, in 
addition to "-Xmx1g -Xms512m".
--jars  Column-separated list of extra jars to put on 
the CLASSPATH.
--enable-gc-logs enables gc logs & dumps.
--show-classpath prints gobblin runtime classpath.
--jt  Only for mapreduce mode: Job submission URL, if not 
set, taken from ${HADOOP_HOME}/conf.
--fs  Only for mapreduce mode: Target file system, if not set, 
taken from ${HADOOP_HOME}/conf.
--help Display this help.
--verbose Display full command used to start the process.
Gobblin Version: 0.15.0
{code}
  

btw, all the removed scripts is been incorporated into above gobblin.sh changes 
in one or other way, I will double check on that anyway.

 

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run  -> gobblin cli run 
> # class: JobStateToJsonConverter
> statestore-checker.sh  -> gobblin statestore-checker 
> # class: StateStoreCleaner
> statestore-clean.sh  -> gobblin statestore-clean 
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh  -> gobblin historystore-manager 
> # class: Cli
> gobblin-admin.sh-> gobblin admin 
> # all gobblin deployment modes
> gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh   -> gobblin cluster-mater start|stop|status
> gobblin-env.sh  -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status
> gobblin-service.sh  -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh   -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh -> gobblin cluster-mater start|stop|status
> {code}
>  
> 2. Also configs needs to be structured and deduped accordingly to make it 
> clear on which config

[jira] [Commented] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-02 Thread Issac Buenrostro (JIRA)


[ 
https://issues.apache.org/jira/browse/GOBBLIN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832080#comment-16832080
 ] 

Issac Buenrostro commented on GOBBLIN-707:
--

Thanks for taking this up [~jaysen]

I do see the point of cleaning up the multiple scripts that Gobblin has, 
however I would challenge that the cleanup should be a bit different. As you 
pointed out there are two types of scripts: commands and services.
 * For commands, the scripts are always pretty much identical, so I believe the 
access should always be through `GobblinCli` (i.e. implemented as 
`CliApplication`s). This means that instead of `gobblin statestore-checker` it 
should be `gobblin cli statestore-checker` and have the bash portion of the 
script be unique. This has the advantage that `gobblin cli --help` will list 
all commands, and commands are self-documenting by using the `@Alias` 
annotation, and even better if we use 
`ConstructorAndPublicMethodsCliObjectFactory` which will automatically create a 
help string for each one, and allow programmatic and cli access with the same 
input.
 * For services, I'm not sure how you're approaching things, but it would also 
be nice to have a single bash script that can handle all of them (given that, 
as you pointed out, they are all of the form `start|stop|status`).

Re: the PR, I'm a bit confused because a lot of scripts were removed but I 
don't understand where the replacements are. I may be missing something 
obvious, and I apologize if that is the case :)

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run  -> gobblin cli run 
> # class: JobStateToJsonConverter
> statestore-checker.sh  -> gobblin statestore-checker 
> # class: StateStoreCleaner
> statestore-clean.sh  -> gobblin statestore-clean 
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh  -> gobblin historystore-manager 
> # class: Cli
> gobblin-admin.sh-> gobblin admin 
> # all gobblin deployment modes
> gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh   -> gobblin cluster-mater start|stop|status
> gobblin-env.sh  -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status
> gobblin-service.sh  -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh   -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh -> gobblin cluster-mater start|stop|status
> {code}
>  
> 2. Also configs needs to be structured and deduped accordingly to make it 
> clear on which config will be picked up for which execution mode.
>  
>  {color:#FF}
>  NOTE: this refactoring t