Pig-cassandra Scritps and Oozie

2013-11-28 Thread Miguel Angel Martin junquera
hi all;

What is the best way to integrate cassandra pig-extension with oozie?

can be configure  oozie to use pig-cassandra instead of pig?

Some ideas that I thinking are:

Launching a Shell jobthat runs ./pig-cassandra script.pig
or   changing environment variables  vakues
or the original to include the pig-cassandra code  etc

Thanks and regards


Re: Pig-cassandra Scritps and Oozie

2013-11-28 Thread Jeremy Hanna
If I remember correctly when I configured pig, cassandra, and oozie to work 
together, I just used vanilla pig but gave it the jars it needed.

What is the problem you’re experiencing that you are unable to do this?

Jeremy

On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera 
mianmarjun.mailingl...@gmail.com wrote:

 hi all;
 
 What is the best way to integrate cassandra pig-extension with oozie?
 
 can be configure  oozie to use pig-cassandra instead of pig?
 
 Some ideas that I thinking are:
 
 Launching a Shell jobthat runs ./pig-cassandra script.pig
 or   changing environment variables  vakues
 or the original to include the pig-cassandra code  etc
 
 Thanks and regards



Re: Pig-cassandra Scritps and Oozie

2013-11-28 Thread Miguel Angel Martin junquera
hi Jeremy,

I do not try test it  still, I only test examples pig from oozie project
 without cassadra.

* pig-cassandra* sets the cassandra pig libraries .jar in the the
PIG_CLASSPATH env var. and after call the original shell script  *pig* from
PIG_HOME/bin/pig and , up to now, I launch pig scripts with pig_cassandra
directly.

I do not know and did not  see how oozie launch pig and I supose that Oozie
launch the PIG_HOME/bin/pig.

If you are using  this config and the pig scripts that use cassandra works
fine  , I suspose that the trick is  putting  the cassandra jars
dependencies and other udf or libraries that you use in the pig scripts  in
the oozie  sharelib or in the lib folder of the job.


On the other hand, I do not know if  i have to configure some thing  like
this.

http://wiki.apache.org/cassandra/HadoopSupport#Oozie

I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1.

I try to test these options and see if it works-

Thanks in advance











2013/11/28 Jeremy Hanna jeremy.hanna1...@gmail.com

 If I remember correctly when I configured pig, cassandra, and oozie to
 work together, I just used vanilla pig but gave it the jars it needed.

 What is the problem you’re experiencing that you are unable to do this?

 Jeremy

 On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera 
 mianmarjun.mailingl...@gmail.com wrote:

  hi all;
 
  What is the best way to integrate cassandra pig-extension with oozie?
 
  can be configure  oozie to use pig-cassandra instead of pig?
 
  Some ideas that I thinking are:
 
  Launching a Shell jobthat runs ./pig-cassandra script.pig
  or   changing environment variables  vakues
  or the original to include the pig-cassandra code  etc
 
  Thanks and regards




Re: Pig-cassandra Scritps and Oozie

2013-11-28 Thread Jeremy Hanna
I believe what I did was when I set up Oozie with the setup script where you 
specify the version of Hadoop and such, I also added additional jars like the 
Cassandra jars and some of its dependencies there and the cassandra.yaml, 
cassandra-env.sh and potentially the topology properties file.  Then with the 
configuration outlined on the Cassandra wiki that you posted, I just used the 
built-in Pig support and it worked fine.  You might try a simple test case to 
read from and write to Cassandra and look for errors either in the job setup 
(the 1 mapper job that Oozie creates to initialize the job) or in the job 
itself.

The specific jars from Cassandra that I added as additional jars were:
cassandra-all
cassandra-thrift
guava
high-scale-lib
lib-thrift
log4j
snake-yaml
commons-io
then cassandra.yaml, cassandra-env.sh, and cassandra-topology.properties file 
(if using property file snitch)

I reference those jars in the environment variable LIBEXT_JARS then execute:
bin/oozie-setup.sh prepare-war -jars $LIBEXT_JARS -extjs ./ext-2.2.zip

Hopefully that helps,

Jeremy

On 28 Nov 2013, at 15:31, Miguel Angel Martin junquera 
mianmarjun.mailingl...@gmail.com wrote:

 hi Jeremy,
 
 I do not try test it  still, I only test examples pig from oozie project
 without cassadra.
 
 * pig-cassandra* sets the cassandra pig libraries .jar in the the
 PIG_CLASSPATH env var. and after call the original shell script  *pig* from
 PIG_HOME/bin/pig and , up to now, I launch pig scripts with pig_cassandra
 directly.
 
 I do not know and did not  see how oozie launch pig and I supose that Oozie
 launch the PIG_HOME/bin/pig.
 
 If you are using  this config and the pig scripts that use cassandra works
 fine  , I suspose that the trick is  putting  the cassandra jars
 dependencies and other udf or libraries that you use in the pig scripts  in
 the oozie  sharelib or in the lib folder of the job.
 
 
 On the other hand, I do not know if  i have to configure some thing  like
 this.
 
 http://wiki.apache.org/cassandra/HadoopSupport#Oozie
 
 I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1.
 
 I try to test these options and see if it works-
 
 Thanks in advance
 
 
 
 
 
 
 
 
 
 
 
 2013/11/28 Jeremy Hanna jeremy.hanna1...@gmail.com
 
 If I remember correctly when I configured pig, cassandra, and oozie to
 work together, I just used vanilla pig but gave it the jars it needed.
 
 What is the problem you’re experiencing that you are unable to do this?
 
 Jeremy
 
 On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera 
 mianmarjun.mailingl...@gmail.com wrote:
 
 hi all;
 
 What is the best way to integrate cassandra pig-extension with oozie?
 
 can be configure  oozie to use pig-cassandra instead of pig?
 
 Some ideas that I thinking are:
 
 Launching a Shell jobthat runs ./pig-cassandra script.pig
 or   changing environment variables  vakues
 or the original to include the pig-cassandra code  etc
 
 Thanks and regards