Re: Hive CLI and Standalone Server : Need Suggestion

2012-03-19 Thread Bejoy Ks
Hi LakshmiKanth
        In production systems if you have a sequence of command to be executed 
pack them in order in a file. Then execute the command as
hive -f filename ;

For simplicity, you can use a cron job to run it in a scheduled manner. Just 
give this command in a .sh file call the file in cron. Infact you can use any 
scheduler that would trigger a .sh file.

But for hadoop based work flows the preferred workflow manager is oozie and 
I recommend oozie for hadoop jobs.

Regrads
Bejoy KS



 From: LakshmiKanth P lk.asp...@gmail.com
To: user@hive.apache.org 
Sent: Tuesday, March 20, 2012 12:19 AM
Subject: Hive CLI and Standalone Server : Need Suggestion
 

Hi
 
 
I need to schedule my hive scripts which needs to process incoming weblogs on 
an hourly basis.  
 
Currently, I could process my weblog files by executing my scripts from hive 
command line interface.  Now I want to keep my scripts in a file and invoke my 
scripts at a regular periods of interval.  I came to know that hive command 
line options provides a facility to pass the .sql file as input for execution.  
Is it the right approach for any production environment.  
 
OR 
 
Should I use my hive server in stand alone mode and inovke my hive scripts 
using JDBC API.
 
Request you to suggest me the best approach.
 
 
Regards,
LK

RE: Hive CLI and Standalone Server : Need Suggestion

2012-03-19 Thread carla.staeben
Great topic as I was wondering a similar thing this morning...I want to use 
oozie to execute my hive job, but I have to pass the job parameters that I 
generate with a shell script.  Some of the literature that I've seen says that 
oozie may or may not allow for calling shell scripts.  Is that true?

Thanks
Carla

From: ext Bejoy Ks [mailto:bejoy...@yahoo.com]
Sent: Monday, March 19, 2012 15:34
To: user@hive.apache.org
Subject: Re: Hive CLI and Standalone Server : Need Suggestion

Hi LakshmiKanth
In production systems if you have a sequence of command to be executed 
pack them in order in a file. Then execute the command as
hive -f filename ;

For simplicity, you can use a cron job to run it in a scheduled manner. Just 
give this command in a .sh file call the file in cron. Infact you can use any 
scheduler that would trigger a .sh file.

But for hadoop based work flows the preferred workflow manager is oozie and I 
recommend oozie for hadoop jobs.

Regrads
Bejoy KS


From: LakshmiKanth P lk.asp...@gmail.commailto:lk.asp...@gmail.com
To: user@hive.apache.orgmailto:user@hive.apache.org
Sent: Tuesday, March 20, 2012 12:19 AM
Subject: Hive CLI and Standalone Server : Need Suggestion


Hi


I need to schedule my hive scripts which needs to process incoming weblogs on 
an hourly basis.

Currently, I could process my weblog files by executing my scripts from hive 
command line interface.  Now I want to keep my scripts in a file and invoke my 
scripts at a regular periods of interval.  I came to know that hive command 
line options provides a facility to pass the .sql file as input for execution.  
Is it the right approach for any production environment.

OR

Should I use my hive server in stand alone mode and inovke my hive scripts 
using JDBC API.

Request you to suggest me the best approach.


Regards,
LK



Re: Hive CLI and Standalone Server : Need Suggestion

2012-03-19 Thread Edward Capriolo
This is a bit of a problem. ozzie is great for workflow scheduling but
oozie does not have actions for everything and adding actions is
non-trivial in current versions.

I have created some bootleg/generic oozie actions that make it easy
to exec pretty much anything and treat it as an action.

https://github.com/edwardcapriolo/m6d_oozie

On Mon, Mar 19, 2012 at 3:38 PM,  carla.stae...@nokia.com wrote:
 Great topic as I was wondering a similar thing this morning…I want to use
 oozie to execute my hive job, but I have to pass the job parameters that I
 generate with a shell script.  Some of the literature that I’ve seen says
 that oozie may or may not allow for calling shell scripts.  Is that true?



 Thanks

 Carla



 From: ext Bejoy Ks [mailto:bejoy...@yahoo.com]
 Sent: Monday, March 19, 2012 15:34
 To: user@hive.apache.org
 Subject: Re: Hive CLI and Standalone Server : Need Suggestion



 Hi LakshmiKanth

         In production systems if you have a sequence of command to be
 executed pack them in order in a file. Then execute the command as

 hive -f filename ;



 For simplicity, you can use a cron job to run it in a scheduled manner. Just
 give this command in a .sh file call the file in cron. Infact you can use
 any scheduler that would trigger a .sh file.



 But for hadoop based work flows the preferred workflow manager is oozie and
 I recommend oozie for hadoop jobs.



 Regrads

 Bejoy KS



 

 From: LakshmiKanth P lk.asp...@gmail.com
 To: user@hive.apache.org
 Sent: Tuesday, March 20, 2012 12:19 AM
 Subject: Hive CLI and Standalone Server : Need Suggestion



 Hi





 I need to schedule my hive scripts which needs to process incoming weblogs
 on an hourly basis.



 Currently, I could process my weblog files by executing my scripts from hive
 command line interface.  Now I want to keep my scripts in a file and invoke
 my scripts at a regular periods of interval.  I came to know that hive
 command line options provides a facility to pass the .sql file as input for
 execution.  Is it the right approach for any production environment.



 OR



 Should I use my hive server in stand alone mode and inovke my hive scripts
 using JDBC API.



 Request you to suggest me the best approach.





 Regards,

 LK




Re: Hive CLI and Standalone Server : Need Suggestion

2012-03-19 Thread Alejandro Abdelnur
Eduardo,

Beside the mapreduce/streaming/hive/pig/sqoop/distcp action, Oozie has a
JAVA action (to execute a Java Main class in the cluster), a SSH action (to
execute a script via SSH in a remote host), and a SHELL action (to execute
a script in the cluster).

Would you mind explaining what does your m6d extension that JAVA, SSH or
SHELL cannot do to in a similar way?

Thanks.

Alejandro

On Mon, Mar 19, 2012 at 12:46 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 This is a bit of a problem. ozzie is great for workflow scheduling but
 oozie does not have actions for everything and adding actions is
 non-trivial in current versions.

 I have created some bootleg/generic oozie actions that make it easy
 to exec pretty much anything and treat it as an action.

 https://github.com/edwardcapriolo/m6d_oozie

 On Mon, Mar 19, 2012 at 3:38 PM,  carla.stae...@nokia.com wrote:
  Great topic as I was wondering a similar thing this morning…I want to use
  oozie to execute my hive job, but I have to pass the job parameters that
 I
  generate with a shell script.  Some of the literature that I’ve seen says
  that oozie may or may not allow for calling shell scripts.  Is that true?
 
 
 
  Thanks
 
  Carla
 
 
 
  From: ext Bejoy Ks [mailto:bejoy...@yahoo.com]
  Sent: Monday, March 19, 2012 15:34
  To: user@hive.apache.org
  Subject: Re: Hive CLI and Standalone Server : Need Suggestion
 
 
 
  Hi LakshmiKanth
 
  In production systems if you have a sequence of command to be
  executed pack them in order in a file. Then execute the command as
 
  hive -f filename ;
 
 
 
  For simplicity, you can use a cron job to run it in a scheduled manner.
 Just
  give this command in a .sh file call the file in cron. Infact you can use
  any scheduler that would trigger a .sh file.
 
 
 
  But for hadoop based work flows the preferred workflow manager is oozie
 and
  I recommend oozie for hadoop jobs.
 
 
 
  Regrads
 
  Bejoy KS
 
 
 
  
 
  From: LakshmiKanth P lk.asp...@gmail.com
  To: user@hive.apache.org
  Sent: Tuesday, March 20, 2012 12:19 AM
  Subject: Hive CLI and Standalone Server : Need Suggestion
 
 
 
  Hi
 
 
 
 
 
  I need to schedule my hive scripts which needs to process incoming
 weblogs
  on an hourly basis.
 
 
 
  Currently, I could process my weblog files by executing my scripts from
 hive
  command line interface.  Now I want to keep my scripts in a file and
 invoke
  my scripts at a regular periods of interval.  I came to know that hive
  command line options provides a facility to pass the .sql file as input
 for
  execution.  Is it the right approach for any production environment.
 
 
 
  OR
 
 
 
  Should I use my hive server in stand alone mode and inovke my hive
 scripts
  using JDBC API.
 
 
 
  Request you to suggest me the best approach.
 
 
 
 
 
  Regards,
 
  LK
 
 



Re: Hive CLI and Standalone Server : Need Suggestion

2012-03-19 Thread Edward Capriolo
I am not trying to knock oozie but
MapReduce Action: Would be great but hadoop docs taught me the proper
way to write hadoop programs was Tool and Configured. 90% of our
legacy jobs are tools. MapReduce action can not launch Tools. So
JavaMain...

SSH action is something I would never allow on our network. Super
bootleg and insecure.

HiveAction requires the entire hive fat client which is not easy since
our RDBMS needs to be configured to allow every possible tasktracker
to access it's metastore. Would be better if HiveAction was
HiveThriftAction then it would only need minimal jars and a host port
pair. Again back to JavaMain...

Not sure about the shell action.  May not have been around when I put
this framework together.

My main point is that oozie in its current form is not very flexible,
what if I want to add an RDBMS action? Beg developers to patch it in?
Just having to patch in actions is detracting. (I know there is a jira
open on this)

The reason I wrote the library was:
https://github.com/edwardcapriolo/m6d_oozie/blob/master/src/main/java/com/m6d/oozie/RunShellProps.java

The problem I was facing with the Shell and Java Main actions is that
if you want to extract any output to be used in the next phase of the
job it is not easy to get at. I wrote a JavaMain that was
capture-output / friendly.


On Mon, Mar 19, 2012 at 5:23 PM, Alejandro Abdelnur t...@cloudera.com wrote:
 Eduardo,

 Beside the mapreduce/streaming/hive/pig/sqoop/distcp action, Oozie has a
 JAVA action (to execute a Java Main class in the cluster), a SSH action (to
 execute a script via SSH in a remote host), and a SHELL action (to execute a
 script in the cluster).

 Would you mind explaining what does your m6d extension that JAVA, SSH or
 SHELL cannot do to in a similar way?

 Thanks.

 Alejandro

 On Mon, Mar 19, 2012 at 12:46 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 This is a bit of a problem. ozzie is great for workflow scheduling but
 oozie does not have actions for everything and adding actions is
 non-trivial in current versions.

 I have created some bootleg/generic oozie actions that make it easy
 to exec pretty much anything and treat it as an action.

 https://github.com/edwardcapriolo/m6d_oozie

 On Mon, Mar 19, 2012 at 3:38 PM,  carla.stae...@nokia.com wrote:
  Great topic as I was wondering a similar thing this morning…I want to
  use
  oozie to execute my hive job, but I have to pass the job parameters that
  I
  generate with a shell script.  Some of the literature that I’ve seen
  says
  that oozie may or may not allow for calling shell scripts.  Is that
  true?
 
 
 
  Thanks
 
  Carla
 
 
 
  From: ext Bejoy Ks [mailto:bejoy...@yahoo.com]
  Sent: Monday, March 19, 2012 15:34
  To: user@hive.apache.org
  Subject: Re: Hive CLI and Standalone Server : Need Suggestion
 
 
 
  Hi LakshmiKanth
 
          In production systems if you have a sequence of command to be
  executed pack them in order in a file. Then execute the command as
 
  hive -f filename ;
 
 
 
  For simplicity, you can use a cron job to run it in a scheduled manner.
  Just
  give this command in a .sh file call the file in cron. Infact you can
  use
  any scheduler that would trigger a .sh file.
 
 
 
  But for hadoop based work flows the preferred workflow manager is oozie
  and
  I recommend oozie for hadoop jobs.
 
 
 
  Regrads
 
  Bejoy KS
 
 
 
  
 
  From: LakshmiKanth P lk.asp...@gmail.com
  To: user@hive.apache.org
  Sent: Tuesday, March 20, 2012 12:19 AM
  Subject: Hive CLI and Standalone Server : Need Suggestion
 
 
 
  Hi
 
 
 
 
 
  I need to schedule my hive scripts which needs to process incoming
  weblogs
  on an hourly basis.
 
 
 
  Currently, I could process my weblog files by executing my scripts from
  hive
  command line interface.  Now I want to keep my scripts in a file and
  invoke
  my scripts at a regular periods of interval.  I came to know that hive
  command line options provides a facility to pass the .sql file as input
  for
  execution.  Is it the right approach for any production environment.
 
 
 
  OR
 
 
 
  Should I use my hive server in stand alone mode and inovke my hive
  scripts
  using JDBC API.
 
 
 
  Request you to suggest me the best approach.
 
 
 
 
 
  Regards,
 
  LK