from:"inelu nagamallikarjuna"

Re: Pig script to load C++ library

2015-12-13 Thread inelu nagamallikarjuna

Hi Shashikant.

Pig supports streaming with help of *stream *operator. This allows invoking
any external executables i.e., perl, python, c++, php etc... inside your
pig scripts.

Thanks
Naga

On Mon, Dec 14, 2015 at 12:18 PM, Shashikant K <
shashikant.kulkarn...@gmail.com> wrote:

> Hi All,
>
> Here is what I am trying to do.
>
>- I have a C++ library which I use to load from PHP script using PHP's
>exec() function. It works perfect. PHP sends the input parameter and
> gets
>the result in output parameter.
>- I want to make use of the same C++ library in my Pig Script and
>perform the same operation. Is there any way to do this?
>- Can you point me to some documentation, because I tried to find it.
>
>
> Thanks in advance.
>
> Regards,
> Shashikant
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: run pig script through eclipse without hadoop

2015-07-21 Thread inelu nagamallikarjuna

Hi,

This is because of some jars missing in the classpath. Please add missing
jars like log4j, commons-logging-1.1.1.jar and others if necessary.

Thanks
Nagamallikarjuna

On Tue, Jul 21, 2015 at 2:44 PM, Divya Gehlot 
wrote:

> Hi,
> Sorry for such a basic question but I am breaking my head to sort it out .
> I am trying to run pig script through eclipse without hadoop on windows .
> I am using Pigserver to run pig script .
> but I am facing build path issues.
>
> I added pig.jar
>  in my
> build
> path but I am getting error when I run my pig script through PigServer
> like
>  PigServer pig = new PigServer(ExecType.LOCAL);
> pig.registerScript("C:/path /to/pig/scripts/JustString.pig");
>
>
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> > org/apache/commons/logging/LogFactory
> > at
> org.apache.pig.impl.util.PropertiesUtil.(PropertiesUtil.java:34)
> > at org.apache.pig.PigServer.(PigServer.java:202)
> > at com.paypal.debugpig.DebugPig.main(DebugPig.java:13)
> > Caused by: java.lang.ClassNotFoundException:
> > org.apache.commons.logging.LogFactory
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> > ... 3 more
>
>
>
> If any body could point me out what dependencies do I need to add to build
> path, from  where I am Running Pigserver other than pig.jar to make it work
> .
> Will appreciate your help.
>
> Thanks,
> Regards,
> Divya
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: create a pipeline

2015-04-15 Thread inelu nagamallikarjuna

Hi,

use work flow manager Oozie, to create the work flow (DAG of jobs i.e pig
scripts).


Thanks
Nagamallikarjuna

On Wed, Apr 15, 2015 at 1:46 PM, pth001  wrote:

> Hi,
>
> How can I create a pipeline (containing a sequence of pig scripts)?
>
> BR,
> Patcharee
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: ClassNotFoundException while running pig in local mode

2014-12-26 Thread inelu nagamallikarjuna

Hi,

Add all the required jars to the PIG CLASSPATH variable, It will resolve
the issue.

Thanks
Naga

On Fri, Dec 26, 2014 at 3:06 PM, Venkat Ramakrishnan <
venkat.archit...@gmail.com> wrote:

> Thanks Praveen. I am running pig-14 on Windows 7.
>
> Can anyone confirm if Hadoop is really required for Pig local?
> If not, should I file an enhancement request?
>
> Thx,
> Venkat.
>
>
> On Fri, Dec 26, 2014 at 2:51 PM, Praveen R 
> wrote:
>
> > I usually have hadoop configured on the system even when using pig in
> local
> > mode and don't remember running pig without hadoop.
> >
> > It could be working on versions pig-13 or prior since it used to ship all
> > hadoop jars along with the release, but with pig-14 hadoop jars are no
> > longer shipped (believe this is to have a lighter packaging).
> >
> > Regards,
> > Praveen
> >
> > On Fri, Dec 26, 2014 at 2:36 PM, Venkat Ramakrishnan <
> > venkat.archit...@gmail.com> wrote:
> >
> > > Thanks Praveen. Is Hadoop required for running pig local ?
> > > I read in a couple of places on the web saying that hadoop
> > > is not required for local mode...
> > >
> > > - Venkat.
> > >
> > > On Fri, Dec 26, 2014 at 1:44 PM, Praveen R <
> prav...@sigmoidanalytics.com
> > >
> > > wrote:
> > >
> > > > Looks like pig isn't able to find the hadoop jars. Could you try
> > putting
> > > > hadoop on the system i.e. have hadoop command in the environment
> path.
> > > >
> > > > Regards,
> > > > Praveen
> > > >
> > > > On Thu, Dec 25, 2014 at 4:17 PM, Venkat Ramakrishnan <
> > > > venkat.archit...@gmail.com> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I am getting the following error while running pig in local mode
> (pig
> > > -X
> > > > > local) :
> > > > >
> > > > > The system cannot find the path specified.
> > > > > java.lang.NoClassDefFoundError:
> org/apache/commons/logging/LogFactory
> > > > > at org.apache.pig.Main.(Main.java:106)
> > > > > Caused by: java.lang.ClassNotFoundException:
> > > > > org.apache.commons.logging.LogFactory
> > > > > at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> > > > > at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> > > > > at java.security.AccessController.doPrivileged(Native Method)
> > > > > at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> > > > > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> > > > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> > > > > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> > > > > ... 1 more
> > > > > Exception in thread "Thread-0" java.lang.NoClassDefFoundError:
> > > > > org/apache/hadoop/fs/LocalFileSystem
> > > > > at org.apache.pig.Main$1.run(Main.java:101)
> > > > > Caused by: java.lang.ClassNotFoundException:
> > > > > org.apache.hadoop.fs.LocalFileSystem
> > > > > at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> > > > > at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> > > > > at java.security.AccessController.doPrivileged(Native Method)
> > > > > at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> > > > > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> > > > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> > > > > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> > > > > ... 1 more
> > > > > Exception in thread "main"
> > > > >
> > > > >
> > > > > Can someone tell me how to resolve this?
> > > > >
> > > > > Thanks,
> > > > > Venkat.
> > > > >
> > > >
> > >
> >
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: ToDate and GetMonth function help

2014-08-18 Thread inelu nagamallikarjuna

Hai,

Write UDF in java to extract the month and any other values from your input
string

Thanks
Naga
On Aug 18, 2014 8:49 PM, "murali krishna p" 
wrote:

>
>
> Trying to read a table column defined as datetime in my pig script as
> follows
>
> load ‘/tmp.psv’ using PIgStore() (open_dte : chararray);
>
>
> Later I wanted to use GetMonth in pig script as followes.
>
> Temp_dt = ToDate(open_dte, ‘-MM-DD’);
> Month = GetMonth(temp_dt);
>
>
> I am getting an error asking to use a explicit cast. Any insights in this
> issue?
>
> Greatly appreciate your help!!
>
>
> Thanks,
> Murali
>
>

Re: Query on Pig

2014-07-10 Thread inelu nagamallikarjuna

Hi,

We are calling external map reduce program inside our pig script to perform
a specific task. Lets take the example crawling process.

-- Load the all seed urls into the relation crawldata.

*crawldata = load 'baseurls' using PigStorage( pageid: chararray,
pageurl:chararray)*
normalizedata = foreach crawldata generate pageid, normalize(pageurl)

--In the above url list, we have good urls and bad/blocklisted urls. we
need to filter the block listed urls. To filter these block listed, we have
a java map reduce program "*blocklisturls*.*jar*". So instead of writing
pig latin statement to filter this block listed urls, we will this java map
reduce program as below.

goodurls = *mapreduce blocklisturls*.*jar*
*store *normalizedata *into '/path/input'*
*load '/path/output' as (pageid:chararray,
pageurl:chararray)*

In the above pig latin statement is a sequence of steps:
1. store will write normalizedata into HDFS under the path of '/path/input.
2. blocklisturls java map reduce program is called on input '/path/input',
process and filters block listed urls, then write the output into HDFS
under the path of '/path/output'
3. load operator will load the data from HDFS (/path/output) into goodurls
relation

Thanks
Nagamallikarjuna




On Thu, Jul 10, 2014 at 4:42 PM, Nivetha K  wrote:

> Hi,
>
>Thanks for replying.Can you please explain how mapreduce operator works
> in pig
> On 5 July 2014 10:35, Darpan R  wrote:
>
> > Looks like Classpath problem :java.lang.RuntimeException:
> > java.lang.ClassNotFoundException:
> > Class
> > WordCount$Map not found
> >
> > Can you make sure your jar is in the class path ?
> >
> >
> > On 4 July 2014 11:19, Nivetha K  wrote:
> >
> > > Hi,
> > >
> > >  I am currently working with Pig. I get struck with following
> script.
> > > A = load 'sample.txt';
> > > B = MAPREDUCE '/home/training/simp.jar' Store A into 'inputDir' Load
> > > 'outputDir' as (word:chararray, count: int) `WordCount inputDir
> > outputDir`;
> > > dump B;
> > >
> > >
> > > Error :
> > >
> > > 2014-07-04 11:17:57,811 [main] WARN
>  org.apache.hadoop.mapred.JobClient -
> > > No job jar file set.  User classes may not be found. See JobConf(Class)
> > or
> > > JobConf#setJar(String).
> > > 2014-07-04 11:18:16,313 [main] INFO
>  org.apache.hadoop.mapred.JobClient -
> > > Task Id : attempt_201407011531_0147_m_00_2, Status : FAILED
> > > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
> > > WordCount$Map not found
> > > at
> > > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1774)
> > > at
> > >
> > >
> >
> org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:191)
> > > at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
> > > at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> > > at java.security.AccessController.doPrivileged(Native Method)
> > > at javax.security.auth.Subject.doAs(Subject.java:396)
> > > at
> > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
> > > at org.apache.hadoop.mapred.Child.main(Child.java:262)
> > > Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not
> > found
> > > at
> > >
> > >
> >
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1680)
> > > at
> > > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1772)
> > >
> > >
> > >
> > > please help me to solve the problem
> > >
> > >
> > > regards,
> > >
> > > Nivetha.
> > >
> >
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: Adding days to Pig

2013-12-14 Thread inelu nagamallikarjuna

Hi

Write a UDF, it takes date and no of days to add and returns the date

Thanks
Naga
On Dec 14, 2013 6:19 AM, "Krishnan Narayanan" 
wrote:

> Hi All ,
>
> I am trying to do something like (get_date +46 days) , how to achieve this
> in pig.
>
> I am using pig 0.10
> help much appreciated.
>
> Thanks
> Krishnan
>

Re: Simple word count in pig..

2013-11-20 Thread inelu nagamallikarjuna

Hai,

 Please go through the following code,

Input Data:
---
DocNameTokens
--
cricketsachin,sehwag,dravid,dhoni
movieamir,salman,hruthik,ranveer
cricketsachin,ganguly,rohit,dhoni
cricketsehwag,sachin,dravid,kohli
moviesalman,amir,sharukh

===
Pig UDF

package com.pig.udf;

import java.io.IOException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.Tuple;

public class WordBag extends EvalFunc {

@Override
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0) {
return null;
}
DataBag myBag = (DataBag) input.get(0);
String frequency = "";
Iterator itr = myBag.iterator();
Tuple tuple = null;
Map wordcount = new HashMap();
while (itr.hasNext()) {
tuple = itr.next();
DataBag tokens = (DataBag) tuple.get(0);
Iterator it = tokens.iterator();
while(it.hasNext())
{
tuple = it.next();
String token = (String) tuple.get(0);
if (wordcount.containsKey(token)) {
int count = wordcount.get(token);
count++;
wordcount.put(token, count);
} else {
wordcount.put(token, 1);
}
}
}
Set keys = wordcount.keySet();
for (String key : keys) {
frequency = frequency + " " + key + ":" + wordcount.get(key);
}
return frequency;
}
}

Build a jar for the above UDF and add it to pig script;


PigScript:
--
register /home/hadoopz/naga/bigdata/pig-0.10.0/pigscripts/wordbag.jar
news = load '/pig/news' using PigStorage() as (doc:chararray,
content:chararray);
words = foreach news generate doc, TOKENIZE(content, ',') as mywords;
describe words;
wordcount = foreach grpwords generate group,
com.pig.udf.WordBag(words.mywords);
dump wordcount;

==
Output
--
docNameTokens and their Frequency
--
(movie, sharukh:1 salman:2 ranveer:1 hruthik:1 amir:2)
(cricket, sehwag:2 kohli:1 rohit:1 ganguly:1 sachin:3 dhoni:2 dravid:2)


On Wed, Nov 20, 2013 at 5:15 AM, jamal sasha  wrote:

> Hi,
>
> I have data already processed in following form:
>
>
> ( id ,{ bag of words})
> So for example:
>
> (foobar, {(foo), (foo),(foobar),(bar)})
> (foo,{(bar),(bar)})
>
> and so on..
> describe processed gives me:
> processed: {id: chararray,tokens: {tuple_of_tokens: (token: chararray)}}
>
>
> Now what I want is.. also count the number of times a word appears in this
> data and output it as
> foobar, foo, 2
> foobar,foobar,1
> foobar,bar,1
> foo,bar,2
>
> and so on...
>
> How do I do this in pig?
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: Converting xml to csv

2013-09-11 Thread inelu nagamallikarjuna

Hai,

Load those two files as two relations and extract or parse XML files by
using extract_regex_all and finally store them into CSV files.

Thanks
Naga
On Sep 12, 2013 5:44 AM, "jamal sasha"  wrote:

> Hi,
>   So I have different xml data sources...For example:
>
> src1.txt
>
> 
> 1
> 
> 
> 2
> 
> .. and so on
>
>
> and another data
>
> src2.txt
>
> 
> 1
> foo
> 
>
> ... and so on
>
>
> So basicaly different xml (valid formats)
>
> Rather than writing different pig scripts.. is there a way to write 1
> script and then convert all these xml data into csv?
> Thanks
>

Re: Delete Output Folder in Pig Script

2013-09-11 Thread inelu nagamallikarjuna

Hai,

Just include fs -rmr directory path as first line of the pig script.

Thanks
Naga
On Sep 11, 2013 7:39 PM, "Dip Kharod"  wrote:

> Hi,
> My Pig script stores data in an HDFS folder and I want to delete
> those folders (like in MapReduce with File object) in the script, every
> time I run it - don't want to delete manually.
> My attempt with “hadoop fs –rmr” or “rmf” does not work – meaning Pig
> script execution fails saying that destination folder exists (I need to
> manually delete the folder prior to running the script) It looks
> like the first thing that the interpreter does is check for the output
> folder
> check.
> Any pointers is appreciated.
> Thanks,Dip

Re: Error during parsing

2013-03-05 Thread inelu nagamallikarjuna

Hi,

I think it is better to download the latest stable version or otherwise
write your own udf for split functionality.

Thanks
Nagamallikarjuna

On Wed, Mar 6, 2013 at 5:04 AM, Mix Nin  wrote:

> Below is my PIG version
>
> Apache Pig version 0.7.1-wilma-3
>
> How do I use higher version of script.
>
>
> On Tue, Mar 5, 2013 at 3:32 PM, inelu nagamallikarjuna
> wrote:
>
> > Hi,
> >
> >
> > The function STRSPLIT is not there in the list of in built fuction of
> > hive-0.7.0. Please use any version from 0.8.0 on words. There are lots of
> > improvements from 0.7.0 to 0.10.0.
> >
> >
> > Thanks
> > Nagamallikarjuna
> >
> > On Wed, Mar 6, 2013 at 4:58 AM, inelu nagamallikarjuna
> > wrote:
> >
> > > Hi,
> > >
> > > This is the command *pig -version* in Linux shell.
> > >
> > > Thanks
> > > Nagamallikarjuna
> > >
> > >
> > > On Wed, Mar 6, 2013 at 4:56 AM, Mix Nin  wrote:
> > >
> > >> I checked by removing REGISTER command, but still I get the error. How
> > do
> > >> I
> > >> check the PIG version?
> > >>
> > >>
> > >> On Tue, Mar 5, 2013 at 3:22 PM, inelu nagamallikarjuna
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > strspit is a builtin function, so the register command is not
> > required.
> > >> > use same script by removing the first line. I already tested the
> > script
> > >> > against pig-0.10.0 version it is working fine.
> > >> >
> > >> > Thanks
> > >> > Nagamallikarjuna
> > >> >
> > >> > On Wed, Mar 6, 2013 at 4:46 AM, Mix Nin 
> wrote:
> > >> >
> > >> > > Below is my script
> > >> > >
> > >> > >
> > >> > > REGISTER '/home/hadoop/lib/piggybank-0.7.0.jar';
> > >> > >
> > >> > > X= LOAD '/user/lnindrakrishna/input/ExpTag.txt'  AS
> > (line:chararray);
> > >> > > Y =foreach X  generate STRSPLIT(line,',') ;
> > >> > >
> > >> > > Thanks
> > >> > >
> > >> > >
> > >> > > On Tue, Mar 5, 2013 at 3:14 PM, Harsha  wrote:
> > >> > >
> > >> > > > Hi Mix,
> > >> > > >there is a additional ";"
> > >> > > > Y=foreach data { generate STRSPLIT(line,',') ;};
> > >> > > > Just before closing }
> > >> > > >
> > >> > > > --
> > >> > > > Harsha
> > >> > > >
> > >> > > >
> > >> > > > On Tuesday, March 5, 2013 at 2:49 PM, Mix Nin wrote:
> > >> > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > I executed below PIG commands.
> > >> > > > >
> > >> > > > > X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS
> > >> (line:chararray);
> > >> > > > > Y=foreach data { generate STRSPLIT(line,',') ;};
> > >> > > > >
> > >> > > > >
> > >> > > > > And I get below error. What is wrong in my script. I tried
> > >> removing
> > >> > > > flower
> > >> > > > > braces. giving extra spaces. But nothing worked
> > >> > > > >
> > >> > > > > 2013-03-05 15:38:57,124 [main] ERROR
> > >> > org.apache.pig.tools.grunt.Grunt -
> > >> > > > > ERROR 1000: Error during parsing. Encountered " 
> > "Y=foreach
> > >> ""
> > >> > at
> > >> > > > > line 2, column 1.
> > >> > > > > Was expecting one of:
> > >> > > > > 
> > >> > > > > "cat" ...
> > >> > > > > "fs" ...
> > >> > > > > "cd" ...
> > >> > > > > "cp" ...
> > >> > > > > "copyFromLocal" ...
> > >> > > > > "copyToLocal" ...
> > >> > > > > "dump" ...
> > >> > > > > "describe" ...
> > >> > > > > "aliases" ...
> > >> > > > > "explain" ...
> > >> > > > > "help" ...
> > >> > > > > "kill" ...
> > >> > > > > "ls" ...
> > >> > > > > "mv" ...
> > >> > > > > "mkdir" ...
> > >> > > > > "pwd" ...
> > >> > > > > "quit" ...
> > >> > > > > "register" ...
> > >> > > > > "rm" ...
> > >> > > > > "rmf" ...
> > >> > > > > "set" ...
> > >> > > > > "illustrate" ...
> > >> > > > > "run" ...
> > >> > > > > "exec" ...
> > >> > > > > "scriptDone" ...
> > >> > > > > "" ...
> > >> > > > >  ...
> > >> > > > > ";" ...
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Thanks and Regards
> > >> > Nagamallikarjuna
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Thanks and Regards
> > > Nagamallikarjuna
> > >
> >
> >
> >
> > --
> > Thanks and Regards
> > Nagamallikarjuna
> >
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: Error during parsing

2013-03-05 Thread inelu nagamallikarjuna

Hi,


The function STRSPLIT is not there in the list of in built fuction of
hive-0.7.0. Please use any version from 0.8.0 on words. There are lots of
improvements from 0.7.0 to 0.10.0.


Thanks
Nagamallikarjuna

On Wed, Mar 6, 2013 at 4:58 AM, inelu nagamallikarjuna
wrote:

> Hi,
>
> This is the command *pig -version* in Linux shell.
>
> Thanks
> Nagamallikarjuna
>
>
> On Wed, Mar 6, 2013 at 4:56 AM, Mix Nin  wrote:
>
>> I checked by removing REGISTER command, but still I get the error. How do
>> I
>> check the PIG version?
>>
>>
>> On Tue, Mar 5, 2013 at 3:22 PM, inelu nagamallikarjuna
>> wrote:
>>
>> > Hi,
>> >
>> > strspit is a builtin function, so the register command is not required.
>> > use same script by removing the first line. I already tested the script
>> > against pig-0.10.0 version it is working fine.
>> >
>> > Thanks
>> > Nagamallikarjuna
>> >
>> > On Wed, Mar 6, 2013 at 4:46 AM, Mix Nin  wrote:
>> >
>> > > Below is my script
>> > >
>> > >
>> > > REGISTER '/home/hadoop/lib/piggybank-0.7.0.jar';
>> > >
>> > > X= LOAD '/user/lnindrakrishna/input/ExpTag.txt'  AS (line:chararray);
>> > > Y =foreach X  generate STRSPLIT(line,',') ;
>> > >
>> > > Thanks
>> > >
>> > >
>> > > On Tue, Mar 5, 2013 at 3:14 PM, Harsha  wrote:
>> > >
>> > > > Hi Mix,
>> > > >there is a additional ";"
>> > > > Y=foreach data { generate STRSPLIT(line,',') ;};
>> > > > Just before closing }
>> > > >
>> > > > --
>> > > > Harsha
>> > > >
>> > > >
>> > > > On Tuesday, March 5, 2013 at 2:49 PM, Mix Nin wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I executed below PIG commands.
>> > > > >
>> > > > > X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS
>> (line:chararray);
>> > > > > Y=foreach data { generate STRSPLIT(line,',') ;};
>> > > > >
>> > > > >
>> > > > > And I get below error. What is wrong in my script. I tried
>> removing
>> > > > flower
>> > > > > braces. giving extra spaces. But nothing worked
>> > > > >
>> > > > > 2013-03-05 15:38:57,124 [main] ERROR
>> > org.apache.pig.tools.grunt.Grunt -
>> > > > > ERROR 1000: Error during parsing. Encountered "  "Y=foreach
>> ""
>> > at
>> > > > > line 2, column 1.
>> > > > > Was expecting one of:
>> > > > > 
>> > > > > "cat" ...
>> > > > > "fs" ...
>> > > > > "cd" ...
>> > > > > "cp" ...
>> > > > > "copyFromLocal" ...
>> > > > > "copyToLocal" ...
>> > > > > "dump" ...
>> > > > > "describe" ...
>> > > > > "aliases" ...
>> > > > > "explain" ...
>> > > > > "help" ...
>> > > > > "kill" ...
>> > > > > "ls" ...
>> > > > > "mv" ...
>> > > > > "mkdir" ...
>> > > > > "pwd" ...
>> > > > > "quit" ...
>> > > > > "register" ...
>> > > > > "rm" ...
>> > > > > "rmf" ...
>> > > > > "set" ...
>> > > > > "illustrate" ...
>> > > > > "run" ...
>> > > > > "exec" ...
>> > > > > "scriptDone" ...
>> > > > > "" ...
>> > > > >  ...
>> > > > > ";" ...
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks and Regards
>> > Nagamallikarjuna
>> >
>>
>
>
>
> --
> Thanks and Regards
> Nagamallikarjuna
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: Error during parsing

2013-03-05 Thread inelu nagamallikarjuna

Hi,

This is the command *pig -version* in Linux shell.

Thanks
Nagamallikarjuna

On Wed, Mar 6, 2013 at 4:56 AM, Mix Nin  wrote:

> I checked by removing REGISTER command, but still I get the error. How do I
> check the PIG version?
>
>
> On Tue, Mar 5, 2013 at 3:22 PM, inelu nagamallikarjuna
> wrote:
>
> > Hi,
> >
> > strspit is a builtin function, so the register command is not required.
> > use same script by removing the first line. I already tested the script
> > against pig-0.10.0 version it is working fine.
> >
> > Thanks
> > Nagamallikarjuna
> >
> > On Wed, Mar 6, 2013 at 4:46 AM, Mix Nin  wrote:
> >
> > > Below is my script
> > >
> > >
> > > REGISTER '/home/hadoop/lib/piggybank-0.7.0.jar';
> > >
> > > X= LOAD '/user/lnindrakrishna/input/ExpTag.txt'  AS (line:chararray);
> > > Y =foreach X  generate STRSPLIT(line,',') ;
> > >
> > > Thanks
> > >
> > >
> > > On Tue, Mar 5, 2013 at 3:14 PM, Harsha  wrote:
> > >
> > > > Hi Mix,
> > > >there is a additional ";"
> > > > Y=foreach data { generate STRSPLIT(line,',') ;};
> > > > Just before closing }
> > > >
> > > > --
> > > > Harsha
> > > >
> > > >
> > > > On Tuesday, March 5, 2013 at 2:49 PM, Mix Nin wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I executed below PIG commands.
> > > > >
> > > > > X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS
> (line:chararray);
> > > > > Y=foreach data { generate STRSPLIT(line,',') ;};
> > > > >
> > > > >
> > > > > And I get below error. What is wrong in my script. I tried removing
> > > > flower
> > > > > braces. giving extra spaces. But nothing worked
> > > > >
> > > > > 2013-03-05 15:38:57,124 [main] ERROR
> > org.apache.pig.tools.grunt.Grunt -
> > > > > ERROR 1000: Error during parsing. Encountered "  "Y=foreach
> ""
> > at
> > > > > line 2, column 1.
> > > > > Was expecting one of:
> > > > > 
> > > > > "cat" ...
> > > > > "fs" ...
> > > > > "cd" ...
> > > > > "cp" ...
> > > > > "copyFromLocal" ...
> > > > > "copyToLocal" ...
> > > > > "dump" ...
> > > > > "describe" ...
> > > > > "aliases" ...
> > > > > "explain" ...
> > > > > "help" ...
> > > > > "kill" ...
> > > > > "ls" ...
> > > > > "mv" ...
> > > > > "mkdir" ...
> > > > > "pwd" ...
> > > > > "quit" ...
> > > > > "register" ...
> > > > > "rm" ...
> > > > > "rmf" ...
> > > > > "set" ...
> > > > > "illustrate" ...
> > > > > "run" ...
> > > > > "exec" ...
> > > > > "scriptDone" ...
> > > > > "" ...
> > > > >  ...
> > > > > ";" ...
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks and Regards
> > Nagamallikarjuna
> >
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: Error during parsing

2013-03-05 Thread inelu nagamallikarjuna

Hi,

strspit is a builtin function, so the register command is not required.
use same script by removing the first line. I already tested the script
against pig-0.10.0 version it is working fine.

Thanks
Nagamallikarjuna

On Wed, Mar 6, 2013 at 4:46 AM, Mix Nin  wrote:

> Below is my script
>
>
> REGISTER '/home/hadoop/lib/piggybank-0.7.0.jar';
>
> X= LOAD '/user/lnindrakrishna/input/ExpTag.txt'  AS (line:chararray);
> Y =foreach X  generate STRSPLIT(line,',') ;
>
> Thanks
>
>
> On Tue, Mar 5, 2013 at 3:14 PM, Harsha  wrote:
>
> > Hi Mix,
> >there is a additional ";"
> > Y=foreach data { generate STRSPLIT(line,',') ;};
> > Just before closing }
> >
> > --
> > Harsha
> >
> >
> > On Tuesday, March 5, 2013 at 2:49 PM, Mix Nin wrote:
> >
> > > Hi,
> > >
> > > I executed below PIG commands.
> > >
> > > X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray);
> > > Y=foreach data { generate STRSPLIT(line,',') ;};
> > >
> > >
> > > And I get below error. What is wrong in my script. I tried removing
> > flower
> > > braces. giving extra spaces. But nothing worked
> > >
> > > 2013-03-05 15:38:57,124 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > > ERROR 1000: Error during parsing. Encountered "  "Y=foreach "" at
> > > line 2, column 1.
> > > Was expecting one of:
> > > 
> > > "cat" ...
> > > "fs" ...
> > > "cd" ...
> > > "cp" ...
> > > "copyFromLocal" ...
> > > "copyToLocal" ...
> > > "dump" ...
> > > "describe" ...
> > > "aliases" ...
> > > "explain" ...
> > > "help" ...
> > > "kill" ...
> > > "ls" ...
> > > "mv" ...
> > > "mkdir" ...
> > > "pwd" ...
> > > "quit" ...
> > > "register" ...
> > > "rm" ...
> > > "rmf" ...
> > > "set" ...
> > > "illustrate" ...
> > > "run" ...
> > > "exec" ...
> > > "scriptDone" ...
> > > "" ...
> > >  ...
> > > ";" ...
> > >
> > >
> >
> >
> >
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: Error during parsing

2013-03-05 Thread inelu nagamallikarjuna

Hi,

Please paste your pig script here..

Thanks
Nagamallikarjuna

On Wed, Mar 6, 2013 at 4:39 AM, Mix Nin  wrote:

> Thanks for the reply. Now I get below error:
>
>  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve
> STRSPLIT using imports: [, org.apache.pig.builtin.,
> org.apache.pig.impl.builtin.
>
>
> On Tue, Mar 5, 2013 at 3:07 PM, inelu nagamallikarjuna
> wrote:
>
> > Hi,
> >
> > There is a small mistake in your script. You used relation name called
> data
> > in second line use X instead of data.
> >
> > *Sample script:
> >
> > X= LOAD '/streamming/read' AS (line : chararray);
> > Y = foreach X generate STRSPLIT(line,' ');
> > dump Y;*
> >
> > Thanks
> > Nagamallikarjuna
> >
> > On Wed, Mar 6, 2013 at 4:19 AM, Mix Nin  wrote:
> >
> > > Hi,
> > >
> > > I executed below PIG commands.
> > >
> > >  X= LOAD '/user/lnindrakrishna/input/ExpTag.txt'  AS (line:chararray);
> > >  Y=foreach data  { generate STRSPLIT(line,',') ;};
> > >
> > >
> > > And I get below error. What is wrong in my script. I tried removing
> > flower
> > > braces. giving extra spaces. But nothing worked
> > >
> > > 2013-03-05 15:38:57,124 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > > ERROR 1000: Error during parsing. Encountered "  "Y=foreach "" at
> > > line 2, column 1.
> > > Was expecting one of:
> > > 
> > > "cat" ...
> > > "fs" ...
> > > "cd" ...
> > > "cp" ...
> > > "copyFromLocal" ...
> > > "copyToLocal" ...
> > > "dump" ...
> > > "describe" ...
> > > "aliases" ...
> > > "explain" ...
> > > "help" ...
> > > "kill" ...
> > > "ls" ...
> > > "mv" ...
> > > "mkdir" ...
> > > "pwd" ...
> > > "quit" ...
> > > "register" ...
> > > "rm" ...
> > > "rmf" ...
> > > "set" ...
> > > "illustrate" ...
> > > "run" ...
> > > "exec" ...
> > > "scriptDone" ...
> > > "" ...
> > >  ...
> > > ";" ...
> > >
> >
> >
> >
> > --
> > Thanks and Regards
> > Nagamallikarjuna
> >
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: Error during parsing

2013-03-05 Thread inelu nagamallikarjuna

Hi,

There is a small mistake in your script. You used relation name called data
in second line use X instead of data.

*Sample script:

X= LOAD '/streamming/read' AS (line : chararray);
Y = foreach X generate STRSPLIT(line,' ');
dump Y;*

Thanks
Nagamallikarjuna

On Wed, Mar 6, 2013 at 4:19 AM, Mix Nin  wrote:

> Hi,
>
> I executed below PIG commands.
>
>  X= LOAD '/user/lnindrakrishna/input/ExpTag.txt'  AS (line:chararray);
>  Y=foreach data  { generate STRSPLIT(line,',') ;};
>
>
> And I get below error. What is wrong in my script. I tried removing flower
> braces. giving extra spaces. But nothing worked
>
> 2013-03-05 15:38:57,124 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1000: Error during parsing. Encountered "  "Y=foreach "" at
> line 2, column 1.
> Was expecting one of:
> 
> "cat" ...
> "fs" ...
> "cd" ...
> "cp" ...
> "copyFromLocal" ...
> "copyToLocal" ...
> "dump" ...
> "describe" ...
> "aliases" ...
> "explain" ...
> "help" ...
> "kill" ...
> "ls" ...
> "mv" ...
> "mkdir" ...
> "pwd" ...
> "quit" ...
> "register" ...
> "rm" ...
> "rmf" ...
> "set" ...
> "illustrate" ...
> "run" ...
> "exec" ...
> "scriptDone" ...
> "" ...
>  ...
> ";" ...
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: UDF to calculate Average of whole dataset

2013-03-05 Thread inelu nagamallikarjuna

Hi,

I am providing sample UDF and how to use it in pig script.

*JAVA CLASS:

package myudf.udf.upper;

public class UPPER extends EvalFunc
{
logic to convert all the tokens into Upper case ones.
}*

*input data:*
naga
siva
ravi

*Pig Script*

*-- Always use absolute path of the udf jar location
register /home/naga/bigdata/pig-0.10.0/upper.jar
data = load '/data/names/' using PigStorage() as (name: chararray);
names = foreach data generate **myudf.udf.upper.UPPER(name);
dump names;

output:*

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
2013-03-06 04:08:14,017 [main] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2013-03-06 04:08:14,018 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
*(NAGA)
(SIVA)
(RAVI)*


Thanks
Nagamallikarjuna


On Wed, Mar 6, 2013 at 3:42 AM, inelu nagamallikarjuna
wrote:

> Hi,
>
> Use the fully qualified class name like org.apache.udf.myudf.udfName in
> the pig script while using udf.
> Otherwise use only udf name in the script and while running use like pig -
> Dudf.import.list=org.apache.udf.myudf.evaluation.string scriptname.pig
>
>
> Thanks
> Nagamallikarjuna
>
>
> On Wed, Mar 6, 2013 at 2:54 AM, Preeti Gupta wrote:
>
>> Nope. It does not work
>>
>> 2013-03-05 13:22:28,768 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1070: Could not resolve myudf.CalculateAvg using imports: [,
>> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
>> Details at logfile:
>> /Users/PreetiGupta/Documents/CMPS290S/project/pig_1362518535200.log
>> ~
>>
>> Pig script
>>
>> REGISTER ./myudfs.jar;
>> dividends = load 'myfile' as (A);
>> dump dividends
>> --grouped   = filter dividends by A>-1000.0;
>> --avg   = foreach (filter dividends by A>-1000.0) generate AVG(A);
>> avg = foreach (group dividends all) generate
>> myudf.CalculateAvg(dividends);
>> dump avg
>>
>> My jar file
>>
>> bash-3.2# vi a.txt
>>
>>  0 Mon Mar 04 13:45:44 PST 2013 META-INF/
>> 60 Mon Mar 04 13:45:44 PST 2013 META-INF/MANIFEST.MF
>>   1190 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Final.class
>>   1306 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Initial.class
>>   1477 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Intermediate.class
>>   4205 Mon Mar 04 13:45:16 PST 2013 CalculateAvg.class
>> ~
>>
>> On Mar 5, 2013, at 1:09 PM, pablomar 
>> wrote:
>>
>> > did you try with {jarFileName}.{FunctionName} ?
>> > example: myudfs.CalculateAvg ?
>> >
>> >
>> > On Tue, Mar 5, 2013 at 4:04 PM, Preeti Gupta > >wrote:
>> >
>> >> I kept the code in myudfs.jar and my pig script is point to it using
>> >> register command but the script is not able to find CalculateAvg
>> function.
>> >> I don't have any packages defined in the java file and the jar is my
>> >> current directory.
>> >>
>> >>
>> >> On Mar 5, 2013, at 3:17 AM, Jonathan Coveney 
>> wrote:
>> >>
>> >>> dividends = load 'try.txt'
>> >>> a = foreach dividends generate FLATTEN(TOBAG(*));
>> >>> b = foreach (group a all) generate CalculateAvg($1);
>> >>>
>> >>> I think that should work
>> >>>
>> >>>
>> >>> 2013/3/5 pablomar 
>> >>>
>> >>>> what is the error ?
>> >>>> function not found or something like that ?
>> >>>>
>> >>>> what about this ?
>> >>>> avg   = generate myudfs.CalculateAvg(dividends);
>> >>>>
>> >>>>
>> >>>> On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <
>> >> preetigupt...@soe.ucsc.edu
>> >>>>> wrote:
>> >>>>
>> >>>>> Hello All,
>> >>>>>
>> >>>>> I have dataset like
>> >>>>>
>> >>>>> 0, 10.1, 20.1, 30, 40,
>> >>>>> 50, 60, 70, 80.1, 1,
>> >>>>> 2, 3, 4, 5, 6,
>> >>>>> 7, 8, 9, 10, 11,
>> >>>>> 12, 13, 14, 15, 16,
>> >>>>> 1, 2, 3, 4, 5,
>> >>>>> 56, 6, 7, 8, 9,
>> >>>>> 9, 9, 9, 12, 1,
>> >>>>> 3, 14, 1, 5, 6,
>> >>>>> 7, 8, 8, 9, 12
>> >>>>>
>> >>>>> So basically comma separated values. But I want to consider this as
>> one
>> >>>>> data column and I want to calculate the average of the whole
>> dataset.
>> >>>>>
>> >>>>> I believe I have to write UDF to calculate average. Pig is able to
>> load
>> >>>>> this data
>> >>>>>
>> >>>>> (  0, 10.1, 20.1, 30, 40,)
>> >>>>> (  50, 60, 70, 80.1, 1,)
>> >>>>> (  2, 3, 4, 5, 6,)
>> >>>>> (  7, 8, 9, 10, 11,)
>> >>>>> (  12, 13, 14, 15, 16,)
>> >>>>> (  1, 2, 3, 4, 5,)
>> >>>>> (  56, 6, 7, 8, 9,)
>> >>>>> (  9, 9, 9, 12, 1,)
>> >>>>> (  3, 14, 1, 5, 6,)
>> >>>>> (  7, 8, 8, 9, 12 )
>> >>>>>
>> >>>>> and How do I invoke that UDF in my pig script? Say I implement
>> >>>>> CalculateAvg function.
>> >>>>>
>> >>>>> REGISTER ./myudfs.jar
>> >>>>> dividends = load 'try.txt';
>> >>>>> dump dividends
>> >>>>> --grouped   = group dividends by symbol;
>> >>>>> avg   = generate CalculateAvg(dividends);
>> >>>>> dump avg
>> >>>>> --store avg into 'average_dividend';
>> >>>>>
>> >>>>> It fails.
>> >>>>>
>> >>>>>
>> >>>>
>> >>
>> >>
>>
>>
>
>
> --
> Thanks and Regards
> Nagamallikarjuna
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: UDF to calculate Average of whole dataset

2013-03-05 Thread inelu nagamallikarjuna

Hi,

Use the fully qualified class name like org.apache.udf.myudf.udfName in the
pig script while using udf.
Otherwise use only udf name in the script and while running use like pig -
Dudf.import.list=org.apache.udf.myudf.evaluation.string scriptname.pig


Thanks
Nagamallikarjuna

On Wed, Mar 6, 2013 at 2:54 AM, Preeti Gupta wrote:

> Nope. It does not work
>
> 2013-03-05 13:22:28,768 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1070: Could not resolve myudf.CalculateAvg using imports: [,
> org.apache.pig.builtin., org.apache.pig.impl.builtin.]
> Details at logfile:
> /Users/PreetiGupta/Documents/CMPS290S/project/pig_1362518535200.log
> ~
>
> Pig script
>
> REGISTER ./myudfs.jar;
> dividends = load 'myfile' as (A);
> dump dividends
> --grouped   = filter dividends by A>-1000.0;
> --avg   = foreach (filter dividends by A>-1000.0) generate AVG(A);
> avg = foreach (group dividends all) generate myudf.CalculateAvg(dividends);
> dump avg
>
> My jar file
>
> bash-3.2# vi a.txt
>
>  0 Mon Mar 04 13:45:44 PST 2013 META-INF/
> 60 Mon Mar 04 13:45:44 PST 2013 META-INF/MANIFEST.MF
>   1190 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Final.class
>   1306 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Initial.class
>   1477 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Intermediate.class
>   4205 Mon Mar 04 13:45:16 PST 2013 CalculateAvg.class
> ~
>
> On Mar 5, 2013, at 1:09 PM, pablomar 
> wrote:
>
> > did you try with {jarFileName}.{FunctionName} ?
> > example: myudfs.CalculateAvg ?
> >
> >
> > On Tue, Mar 5, 2013 at 4:04 PM, Preeti Gupta  >wrote:
> >
> >> I kept the code in myudfs.jar and my pig script is point to it using
> >> register command but the script is not able to find CalculateAvg
> function.
> >> I don't have any packages defined in the java file and the jar is my
> >> current directory.
> >>
> >>
> >> On Mar 5, 2013, at 3:17 AM, Jonathan Coveney 
> wrote:
> >>
> >>> dividends = load 'try.txt'
> >>> a = foreach dividends generate FLATTEN(TOBAG(*));
> >>> b = foreach (group a all) generate CalculateAvg($1);
> >>>
> >>> I think that should work
> >>>
> >>>
> >>> 2013/3/5 pablomar 
> >>>
>  what is the error ?
>  function not found or something like that ?
> 
>  what about this ?
>  avg   = generate myudfs.CalculateAvg(dividends);
> 
> 
>  On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta <
> >> preetigupt...@soe.ucsc.edu
> > wrote:
> 
> > Hello All,
> >
> > I have dataset like
> >
> > 0, 10.1, 20.1, 30, 40,
> > 50, 60, 70, 80.1, 1,
> > 2, 3, 4, 5, 6,
> > 7, 8, 9, 10, 11,
> > 12, 13, 14, 15, 16,
> > 1, 2, 3, 4, 5,
> > 56, 6, 7, 8, 9,
> > 9, 9, 9, 12, 1,
> > 3, 14, 1, 5, 6,
> > 7, 8, 8, 9, 12
> >
> > So basically comma separated values. But I want to consider this as
> one
> > data column and I want to calculate the average of the whole dataset.
> >
> > I believe I have to write UDF to calculate average. Pig is able to
> load
> > this data
> >
> > (  0, 10.1, 20.1, 30, 40,)
> > (  50, 60, 70, 80.1, 1,)
> > (  2, 3, 4, 5, 6,)
> > (  7, 8, 9, 10, 11,)
> > (  12, 13, 14, 15, 16,)
> > (  1, 2, 3, 4, 5,)
> > (  56, 6, 7, 8, 9,)
> > (  9, 9, 9, 12, 1,)
> > (  3, 14, 1, 5, 6,)
> > (  7, 8, 8, 9, 12 )
> >
> > and How do I invoke that UDF in my pig script? Say I implement
> > CalculateAvg function.
> >
> > REGISTER ./myudfs.jar
> > dividends = load 'try.txt';
> > dump dividends
> > --grouped   = group dividends by symbol;
> > avg   = generate CalculateAvg(dividends);
> > dump avg
> > --store avg into 'average_dividend';
> >
> > It fails.
> >
> >
> 
> >>
> >>
>
>


-- 
Thanks and Regards
Nagamallikarjuna

Re: Is there a way to limit the number of maps produced by HBaseStorage ?

2013-01-21 Thread inelu nagamallikarjuna

Hi Vincent,

You can restrict the number of concurrent maps by setting this
parameter *mapred.tasktracker.map.tasks.maximum
= 1 or 2*.



*Thanks
Nagamallikarjuna*

On Mon, Jan 21, 2013 at 7:13 PM, Mohammad Tariq  wrote:

> Hello Vincent,
>
>  The number of map tasks for a job is primarily governed by the
> InputSplits and the InputFormat you are using. So setting it through a
> config parameter doesn't guarantee that your job would have the specified
> number of map tasks. However, you can give it a try by using "set
> mapred.map.tasks=n" in your PigLatin job.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Mon, Jan 21, 2013 at 6:57 PM, Vincent Barat  >wrote:
>
> > Hi,
> >
> > We are using HBaseStorage intensively to load data from tables having
> more
> > than 100 regions.
> >
> > HBaseStorage generates 1 map par region, and our cluster having 50 map
> > slots, it happens that our PIG scripts start 50 maps reading concurrently
> > data from HBase.
> >
> > The problem is that our HBase cluster has only 10 nodes, and thus the
> maps
> > overload it (5 intensive readers per node is too much to bare).
> >
> > So question: is there a way to say to PIG : limit the nb of maps to this
> > maximum (ex: 10) ?
> > If not, how can I patch the code to do this ?
> >
> > Thanks a lot for your help
> >
>
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Mon, Jan 21, 2013 at 6:57 PM, Vincent Barat  >wrote:
>
> > Hi,
> >
> > We are using HBaseStorage intensively to load data from tables having
> more
> > than 100 regions.
> >
> > HBaseStorage generates 1 map par region, and our cluster having 50 map
> > slots, it happens that our PIG scripts start 50 maps reading concurrently
> > data from HBase.
> >
> > The problem is that our HBase cluster has only 10 nodes, and thus the
> maps
> > overload it (5 intensive readers per node is too much to bare).
> >
> > So question: is there a way to say to PIG : limit the nb of maps to this
> > maximum (ex: 10) ?
> > If not, how can I patch the code to do this ?
> >
> > Thanks a lot for your help
> >
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: [ANNOUNCE] Welcome new Apache Pig Committers Rohini Palaniswamy

2012-11-01 Thread inelu nagamallikarjuna

Congrats Rohini..

On Thu, Nov 1, 2012 at 10:13 AM, Aniket Mokashi  wrote:

> Congrats Rohini...
>
>
> On Mon, Oct 29, 2012 at 11:31 AM, Julien Le Dem 
> wrote:
>
> > Congrats Rohini !
> >
> >
> > On Sun, Oct 28, 2012 at 9:42 AM, Bill Graham 
> wrote:
> > > Congrats Rohini! Great news indeed.
> > >
> > > On Saturday, October 27, 2012, Jon Coveney wrote:
> > >
> > >> Wonderful news!
> > >>
> > >> On Oct 26, 2012, at 9:51 PM, Gianmarco De Francisci Morales <
> > >> g...@apache.org > wrote:
> > >>
> > >> > Congratulations Rohini!
> > >> > Welcome onboard :)
> > >> > --
> > >> > Gianmarco
> > >> >
> > >> >
> > >> > On Fri, Oct 26, 2012 at 7:32 PM, Prasanth J <
> > buckeye.prasa...@gmail.com>
> > >> wrote:
> > >> >> Congrats Rohini!
> > >> >>
> > >> >> Thanks
> > >> >> -- Prasanth
> > >> >>
> > >> >> On Oct 26, 2012, at 10:21 PM, Santhosh Srinivasan <
> > >> santhosh_mut...@yahoo.com > wrote:
> > >> >>
> > >> >>> Congrats Rohini! Full speed ahead now :)
> > >> >>>
> > >> >>> On Oct 26, 2012, at 4:37 PM, Daniel Dai  > >
> > >> wrote:
> > >> >>>
> > >>  Here is another Pig committer announcement today. Please welcome
> > >>  Rohini Palaniswamy to be a Pig committer!
> > >> 
> > >>  Thanks,
> > >>  Daniel
> > >> >>
> > >>
> > >
> > >
> > > --
> > > Sent from Gmail Mobile
> >
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>



-- 
Thanks and Regards
Nagamallikarjuna

Re: Pig script to load C++ library

Re: run pig script through eclipse without hadoop

Re: create a pipeline

Re: ClassNotFoundException while running pig in local mode

Re: ToDate and GetMonth function help

Re: Query on Pig

Re: Adding days to Pig

Re: Simple word count in pig..

Re: Converting xml to csv

Re: Delete Output Folder in Pig Script

Re: Error during parsing

Re: Error during parsing

Re: Error during parsing

Re: Error during parsing

Re: Error during parsing

Re: Error during parsing

Re: UDF to calculate Average of whole dataset

Re: UDF to calculate Average of whole dataset

Re: Is there a way to limit the number of maps produced by HBaseStorage ?

Re: [ANNOUNCE] Welcome new Apache Pig Committers Rohini Palaniswamy

20 matches

Site Navigation

Mail list logo

Footer information