Re: Pig script to load C++ library
Hi Shashikant. Pig supports streaming with help of *stream *operator. This allows invoking any external executables i.e., perl, python, c++, php etc... inside your pig scripts. Thanks Naga On Mon, Dec 14, 2015 at 12:18 PM, Shashikant K < shashikant.kulkarn...@gmail.com> wrote: > Hi All, > > Here is what I am trying to do. > >- I have a C++ library which I use to load from PHP script using PHP's >exec() function. It works perfect. PHP sends the input parameter and > gets >the result in output parameter. >- I want to make use of the same C++ library in my Pig Script and >perform the same operation. Is there any way to do this? >- Can you point me to some documentation, because I tried to find it. > > > Thanks in advance. > > Regards, > Shashikant > -- Thanks and Regards Nagamallikarjuna
Re: create a pipeline
Hi, use work flow manager Oozie, to create the work flow (DAG of jobs i.e pig scripts). Thanks Nagamallikarjuna On Wed, Apr 15, 2015 at 1:46 PM, pth001 patcharee.thong...@uni.no wrote: Hi, How can I create a pipeline (containing a sequence of pig scripts)? BR, Patcharee -- Thanks and Regards Nagamallikarjuna
Re: ClassNotFoundException while running pig in local mode
Hi, Add all the required jars to the PIG CLASSPATH variable, It will resolve the issue. Thanks Naga On Fri, Dec 26, 2014 at 3:06 PM, Venkat Ramakrishnan venkat.archit...@gmail.com wrote: Thanks Praveen. I am running pig-14 on Windows 7. Can anyone confirm if Hadoop is really required for Pig local? If not, should I file an enhancement request? Thx, Venkat. On Fri, Dec 26, 2014 at 2:51 PM, Praveen R prav...@sigmoidanalytics.com wrote: I usually have hadoop configured on the system even when using pig in local mode and don't remember running pig without hadoop. It could be working on versions pig-13 or prior since it used to ship all hadoop jars along with the release, but with pig-14 hadoop jars are no longer shipped (believe this is to have a lighter packaging). Regards, Praveen On Fri, Dec 26, 2014 at 2:36 PM, Venkat Ramakrishnan venkat.archit...@gmail.com wrote: Thanks Praveen. Is Hadoop required for running pig local ? I read in a couple of places on the web saying that hadoop is not required for local mode... - Venkat. On Fri, Dec 26, 2014 at 1:44 PM, Praveen R prav...@sigmoidanalytics.com wrote: Looks like pig isn't able to find the hadoop jars. Could you try putting hadoop on the system i.e. have hadoop command in the environment path. Regards, Praveen On Thu, Dec 25, 2014 at 4:17 PM, Venkat Ramakrishnan venkat.archit...@gmail.com wrote: Hi all, I am getting the following error while running pig in local mode (pig -X local) : The system cannot find the path specified. java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory at org.apache.pig.Main.clinit(Main.java:106) Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 1 more Exception in thread Thread-0 java.lang.NoClassDefFoundError: org/apache/hadoop/fs/LocalFileSystem at org.apache.pig.Main$1.run(Main.java:101) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.LocalFileSystem at java.net.URLClassLoader$1.run(URLClassLoader.java:372) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:360) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 1 more Exception in thread main Can someone tell me how to resolve this? Thanks, Venkat. -- Thanks and Regards Nagamallikarjuna
Re: ToDate and GetMonth function help
Hai, Write UDF in java to extract the month and any other values from your input string Thanks Naga On Aug 18, 2014 8:49 PM, murali krishna p muralikrishna.par...@icloud.com wrote: Trying to read a table column defined as datetime in my pig script as follows load ‘/tmp.psv’ using PIgStore() (open_dte : chararray); Later I wanted to use GetMonth in pig script as followes. Temp_dt = ToDate(open_dte, ‘-MM-DD’); Month = GetMonth(temp_dt); I am getting an error asking to use a explicit cast. Any insights in this issue? Greatly appreciate your help!! Thanks, Murali
Re: Query on Pig
Hi, We are calling external map reduce program inside our pig script to perform a specific task. Lets take the example crawling process. -- Load the all seed urls into the relation crawldata. *crawldata = load 'baseurls' using PigStorage( pageid: chararray, pageurl:chararray)* normalizedata = foreach crawldata generate pageid, normalize(pageurl) --In the above url list, we have good urls and bad/blocklisted urls. we need to filter the block listed urls. To filter these block listed, we have a java map reduce program *blocklisturls*.*jar*. So instead of writing pig latin statement to filter this block listed urls, we will this java map reduce program as below. goodurls = *mapreduce blocklisturls*.*jar* *store *normalizedata *into '/path/input'* *load '/path/output' as (pageid:chararray, pageurl:chararray)* In the above pig latin statement is a sequence of steps: 1. store will write normalizedata into HDFS under the path of '/path/input. 2. blocklisturls java map reduce program is called on input '/path/input', process and filters block listed urls, then write the output into HDFS under the path of '/path/output' 3. load operator will load the data from HDFS (/path/output) into goodurls relation Thanks Nagamallikarjuna On Thu, Jul 10, 2014 at 4:42 PM, Nivetha K nivethak3...@gmail.com wrote: Hi, Thanks for replying.Can you please explain how mapreduce operator works in pig On 5 July 2014 10:35, Darpan R darpa...@gmail.com wrote: Looks like Classpath problem :java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$Map not found Can you make sure your jar is in the class path ? On 4 July 2014 11:19, Nivetha K nivethak3...@gmail.com wrote: Hi, I am currently working with Pig. I get struck with following script. A = load 'sample.txt'; B = MAPREDUCE '/home/training/simp.jar' Store A into 'inputDir' Load 'outputDir' as (word:chararray, count: int) `WordCount inputDir outputDir`; dump B; Error : 2014-07-04 11:17:57,811 [main] WARN org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 2014-07-04 11:18:16,313 [main] INFO org.apache.hadoop.mapred.JobClient - Task Id : attempt_201407011531_0147_m_00_2, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1774) at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:191) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.ClassNotFoundException: Class WordCount$Map not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1680) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1772) please help me to solve the problem regards, Nivetha. -- Thanks and Regards Nagamallikarjuna
Re: Adding days to Pig
Hi Write a UDF, it takes date and no of days to add and returns the date Thanks Naga On Dec 14, 2013 6:19 AM, Krishnan Narayanan krishnan.sm...@gmail.com wrote: Hi All , I am trying to do something like (get_date +46 days) , how to achieve this in pig. I am using pig 0.10 help much appreciated. Thanks Krishnan
Re: Simple word count in pig..
Hai, Please go through the following code, Input Data: --- DocNameTokens -- cricketsachin,sehwag,dravid,dhoni movieamir,salman,hruthik,ranveer cricketsachin,ganguly,rohit,dhoni cricketsehwag,sachin,dravid,kohli moviesalman,amir,sharukh === Pig UDF package com.pig.udf; import java.io.IOException; import java.util.HashMap; import java.util.Iterator; import java.util.Map; import java.util.Set; import org.apache.pig.EvalFunc; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; public class WordBag extends EvalFuncString { @Override public String exec(Tuple input) throws IOException { if (input == null || input.size() == 0) { return null; } DataBag myBag = (DataBag) input.get(0); String frequency = ; IteratorTuple itr = myBag.iterator(); Tuple tuple = null; MapString, Integer wordcount = new HashMapString, Integer(); while (itr.hasNext()) { tuple = itr.next(); DataBag tokens = (DataBag) tuple.get(0); IteratorTuple it = tokens.iterator(); while(it.hasNext()) { tuple = it.next(); String token = (String) tuple.get(0); if (wordcount.containsKey(token)) { int count = wordcount.get(token); count++; wordcount.put(token, count); } else { wordcount.put(token, 1); } } } SetString keys = wordcount.keySet(); for (String key : keys) { frequency = frequency + + key + : + wordcount.get(key); } return frequency; } } Build a jar for the above UDF and add it to pig script; PigScript: -- register /home/hadoopz/naga/bigdata/pig-0.10.0/pigscripts/wordbag.jar news = load '/pig/news' using PigStorage() as (doc:chararray, content:chararray); words = foreach news generate doc, TOKENIZE(content, ',') as mywords; describe words; wordcount = foreach grpwords generate group, com.pig.udf.WordBag(words.mywords); dump wordcount; == Output -- docNameTokens and their Frequency -- (movie, sharukh:1 salman:2 ranveer:1 hruthik:1 amir:2) (cricket, sehwag:2 kohli:1 rohit:1 ganguly:1 sachin:3 dhoni:2 dravid:2) On Wed, Nov 20, 2013 at 5:15 AM, jamal sasha jamalsha...@gmail.com wrote: Hi, I have data already processed in following form: ( id ,{ bag of words}) So for example: (foobar, {(foo), (foo),(foobar),(bar)}) (foo,{(bar),(bar)}) and so on.. describe processed gives me: processed: {id: chararray,tokens: {tuple_of_tokens: (token: chararray)}} Now what I want is.. also count the number of times a word appears in this data and output it as foobar, foo, 2 foobar,foobar,1 foobar,bar,1 foo,bar,2 and so on... How do I do this in pig? -- Thanks and Regards Nagamallikarjuna
Re: UDF to calculate Average of whole dataset
Hi, Use the fully qualified class name like org.apache.udf.myudf.udfName in the pig script while using udf. Otherwise use only udf name in the script and while running use like pig - Dudf.import.list=org.apache.udf.myudf.evaluation.string scriptname.pig Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 2:54 AM, Preeti Gupta preetigupt...@gmail.comwrote: Nope. It does not work 2013-03-05 13:22:28,768 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve myudf.CalculateAvg using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.] Details at logfile: /Users/PreetiGupta/Documents/CMPS290S/project/pig_1362518535200.log ~ Pig script REGISTER ./myudfs.jar; dividends = load 'myfile' as (A); dump dividends --grouped = filter dividends by A-1000.0; --avg = foreach (filter dividends by A-1000.0) generate AVG(A); avg = foreach (group dividends all) generate myudf.CalculateAvg(dividends); dump avg My jar file bash-3.2# vi a.txt 0 Mon Mar 04 13:45:44 PST 2013 META-INF/ 60 Mon Mar 04 13:45:44 PST 2013 META-INF/MANIFEST.MF 1190 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Final.class 1306 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Initial.class 1477 Mon Mar 04 13:45:16 PST 2013 CalculateAvg$Intermediate.class 4205 Mon Mar 04 13:45:16 PST 2013 CalculateAvg.class ~ On Mar 5, 2013, at 1:09 PM, pablomar pablo.daniel.marti...@gmail.com wrote: did you try with {jarFileName}.{FunctionName} ? example: myudfs.CalculateAvg ? On Tue, Mar 5, 2013 at 4:04 PM, Preeti Gupta preetigupt...@gmail.com wrote: I kept the code in myudfs.jar and my pig script is point to it using register command but the script is not able to find CalculateAvg function. I don't have any packages defined in the java file and the jar is my current directory. On Mar 5, 2013, at 3:17 AM, Jonathan Coveney jcove...@gmail.com wrote: dividends = load 'try.txt' a = foreach dividends generate FLATTEN(TOBAG(*)); b = foreach (group a all) generate CalculateAvg($1); I think that should work 2013/3/5 pablomar pablo.daniel.marti...@gmail.com what is the error ? function not found or something like that ? what about this ? avg = generate myudfs.CalculateAvg(dividends); On Mon, Mar 4, 2013 at 4:56 PM, Preeti Gupta preetigupt...@soe.ucsc.edu wrote: Hello All, I have dataset like 0, 10.1, 20.1, 30, 40, 50, 60, 70, 80.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 56, 6, 7, 8, 9, 9, 9, 9, 12, 1, 3, 14, 1, 5, 6, 7, 8, 8, 9, 12 So basically comma separated values. But I want to consider this as one data column and I want to calculate the average of the whole dataset. I believe I have to write UDF to calculate average. Pig is able to load this data ( 0, 10.1, 20.1, 30, 40,) ( 50, 60, 70, 80.1, 1,) ( 2, 3, 4, 5, 6,) ( 7, 8, 9, 10, 11,) ( 12, 13, 14, 15, 16,) ( 1, 2, 3, 4, 5,) ( 56, 6, 7, 8, 9,) ( 9, 9, 9, 12, 1,) ( 3, 14, 1, 5, 6,) ( 7, 8, 8, 9, 12 ) and How do I invoke that UDF in my pig script? Say I implement CalculateAvg function. REGISTER ./myudfs.jar dividends = load 'try.txt'; dump dividends --grouped = group dividends by symbol; avg = generate CalculateAvg(dividends); dump avg --store avg into 'average_dividend'; It fails. -- Thanks and Regards Nagamallikarjuna
Re: Error during parsing
Hi, There is a small mistake in your script. You used relation name called data in second line use X instead of data. *Sample script: X= LOAD '/streamming/read' AS (line : chararray); Y = foreach X generate STRSPLIT(line,' '); dump Y;* Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:19 AM, Mix Nin pig.mi...@gmail.com wrote: Hi, I executed below PIG commands. X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y=foreach data { generate STRSPLIT(line,',') ;}; And I get below error. What is wrong in my script. I tried removing flower braces. giving extra spaces. But nothing worked 2013-03-05 15:38:57,124 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered PATH Y=foreach at line 2, column 1. Was expecting one of: EOF cat ... fs ... cd ... cp ... copyFromLocal ... copyToLocal ... dump ... describe ... aliases ... explain ... help ... kill ... ls ... mv ... mkdir ... pwd ... quit ... register ... rm ... rmf ... set ... illustrate ... run ... exec ... scriptDone ... ... EOL ... ; ... -- Thanks and Regards Nagamallikarjuna
Re: Error during parsing
Hi, Please paste your pig script here.. Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:39 AM, Mix Nin pig.mi...@gmail.com wrote: Thanks for the reply. Now I get below error: ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve STRSPLIT using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin. On Tue, Mar 5, 2013 at 3:07 PM, inelu nagamallikarjuna malli3...@gmail.comwrote: Hi, There is a small mistake in your script. You used relation name called data in second line use X instead of data. *Sample script: X= LOAD '/streamming/read' AS (line : chararray); Y = foreach X generate STRSPLIT(line,' '); dump Y;* Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:19 AM, Mix Nin pig.mi...@gmail.com wrote: Hi, I executed below PIG commands. X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y=foreach data { generate STRSPLIT(line,',') ;}; And I get below error. What is wrong in my script. I tried removing flower braces. giving extra spaces. But nothing worked 2013-03-05 15:38:57,124 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered PATH Y=foreach at line 2, column 1. Was expecting one of: EOF cat ... fs ... cd ... cp ... copyFromLocal ... copyToLocal ... dump ... describe ... aliases ... explain ... help ... kill ... ls ... mv ... mkdir ... pwd ... quit ... register ... rm ... rmf ... set ... illustrate ... run ... exec ... scriptDone ... ... EOL ... ; ... -- Thanks and Regards Nagamallikarjuna -- Thanks and Regards Nagamallikarjuna
Re: Error during parsing
Hi, strspit is a builtin function, so the register command is not required. use same script by removing the first line. I already tested the script against pig-0.10.0 version it is working fine. Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:46 AM, Mix Nin pig.mi...@gmail.com wrote: Below is my script REGISTER '/home/hadoop/lib/piggybank-0.7.0.jar'; X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y =foreach X generate STRSPLIT(line,',') ; Thanks On Tue, Mar 5, 2013 at 3:14 PM, Harsha har...@defun.org wrote: Hi Mix, there is a additional ; Y=foreach data { generate STRSPLIT(line,',') ;}; Just before closing } -- Harsha On Tuesday, March 5, 2013 at 2:49 PM, Mix Nin wrote: Hi, I executed below PIG commands. X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y=foreach data { generate STRSPLIT(line,',') ;}; And I get below error. What is wrong in my script. I tried removing flower braces. giving extra spaces. But nothing worked 2013-03-05 15:38:57,124 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered PATH Y=foreach at line 2, column 1. Was expecting one of: EOF cat ... fs ... cd ... cp ... copyFromLocal ... copyToLocal ... dump ... describe ... aliases ... explain ... help ... kill ... ls ... mv ... mkdir ... pwd ... quit ... register ... rm ... rmf ... set ... illustrate ... run ... exec ... scriptDone ... ... EOL ... ; ... -- Thanks and Regards Nagamallikarjuna
Re: Error during parsing
Hi, This is the command *pig -version* in Linux shell. Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:56 AM, Mix Nin pig.mi...@gmail.com wrote: I checked by removing REGISTER command, but still I get the error. How do I check the PIG version? On Tue, Mar 5, 2013 at 3:22 PM, inelu nagamallikarjuna malli3...@gmail.comwrote: Hi, strspit is a builtin function, so the register command is not required. use same script by removing the first line. I already tested the script against pig-0.10.0 version it is working fine. Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:46 AM, Mix Nin pig.mi...@gmail.com wrote: Below is my script REGISTER '/home/hadoop/lib/piggybank-0.7.0.jar'; X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y =foreach X generate STRSPLIT(line,',') ; Thanks On Tue, Mar 5, 2013 at 3:14 PM, Harsha har...@defun.org wrote: Hi Mix, there is a additional ; Y=foreach data { generate STRSPLIT(line,',') ;}; Just before closing } -- Harsha On Tuesday, March 5, 2013 at 2:49 PM, Mix Nin wrote: Hi, I executed below PIG commands. X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y=foreach data { generate STRSPLIT(line,',') ;}; And I get below error. What is wrong in my script. I tried removing flower braces. giving extra spaces. But nothing worked 2013-03-05 15:38:57,124 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered PATH Y=foreach at line 2, column 1. Was expecting one of: EOF cat ... fs ... cd ... cp ... copyFromLocal ... copyToLocal ... dump ... describe ... aliases ... explain ... help ... kill ... ls ... mv ... mkdir ... pwd ... quit ... register ... rm ... rmf ... set ... illustrate ... run ... exec ... scriptDone ... ... EOL ... ; ... -- Thanks and Regards Nagamallikarjuna -- Thanks and Regards Nagamallikarjuna
Re: Error during parsing
Hi, The function STRSPLIT is not there in the list of in built fuction of hive-0.7.0. Please use any version from 0.8.0 on words. There are lots of improvements from 0.7.0 to 0.10.0. Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:58 AM, inelu nagamallikarjuna malli3...@gmail.comwrote: Hi, This is the command *pig -version* in Linux shell. Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:56 AM, Mix Nin pig.mi...@gmail.com wrote: I checked by removing REGISTER command, but still I get the error. How do I check the PIG version? On Tue, Mar 5, 2013 at 3:22 PM, inelu nagamallikarjuna malli3...@gmail.comwrote: Hi, strspit is a builtin function, so the register command is not required. use same script by removing the first line. I already tested the script against pig-0.10.0 version it is working fine. Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:46 AM, Mix Nin pig.mi...@gmail.com wrote: Below is my script REGISTER '/home/hadoop/lib/piggybank-0.7.0.jar'; X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y =foreach X generate STRSPLIT(line,',') ; Thanks On Tue, Mar 5, 2013 at 3:14 PM, Harsha har...@defun.org wrote: Hi Mix, there is a additional ; Y=foreach data { generate STRSPLIT(line,',') ;}; Just before closing } -- Harsha On Tuesday, March 5, 2013 at 2:49 PM, Mix Nin wrote: Hi, I executed below PIG commands. X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y=foreach data { generate STRSPLIT(line,',') ;}; And I get below error. What is wrong in my script. I tried removing flower braces. giving extra spaces. But nothing worked 2013-03-05 15:38:57,124 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered PATH Y=foreach at line 2, column 1. Was expecting one of: EOF cat ... fs ... cd ... cp ... copyFromLocal ... copyToLocal ... dump ... describe ... aliases ... explain ... help ... kill ... ls ... mv ... mkdir ... pwd ... quit ... register ... rm ... rmf ... set ... illustrate ... run ... exec ... scriptDone ... ... EOL ... ; ... -- Thanks and Regards Nagamallikarjuna -- Thanks and Regards Nagamallikarjuna -- Thanks and Regards Nagamallikarjuna
Re: Error during parsing
Hi, I think it is better to download the latest stable version or otherwise write your own udf for split functionality. Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 5:04 AM, Mix Nin pig.mi...@gmail.com wrote: Below is my PIG version Apache Pig version 0.7.1-wilma-3 How do I use higher version of script. On Tue, Mar 5, 2013 at 3:32 PM, inelu nagamallikarjuna malli3...@gmail.comwrote: Hi, The function STRSPLIT is not there in the list of in built fuction of hive-0.7.0. Please use any version from 0.8.0 on words. There are lots of improvements from 0.7.0 to 0.10.0. Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:58 AM, inelu nagamallikarjuna malli3...@gmail.comwrote: Hi, This is the command *pig -version* in Linux shell. Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:56 AM, Mix Nin pig.mi...@gmail.com wrote: I checked by removing REGISTER command, but still I get the error. How do I check the PIG version? On Tue, Mar 5, 2013 at 3:22 PM, inelu nagamallikarjuna malli3...@gmail.comwrote: Hi, strspit is a builtin function, so the register command is not required. use same script by removing the first line. I already tested the script against pig-0.10.0 version it is working fine. Thanks Nagamallikarjuna On Wed, Mar 6, 2013 at 4:46 AM, Mix Nin pig.mi...@gmail.com wrote: Below is my script REGISTER '/home/hadoop/lib/piggybank-0.7.0.jar'; X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y =foreach X generate STRSPLIT(line,',') ; Thanks On Tue, Mar 5, 2013 at 3:14 PM, Harsha har...@defun.org wrote: Hi Mix, there is a additional ; Y=foreach data { generate STRSPLIT(line,',') ;}; Just before closing } -- Harsha On Tuesday, March 5, 2013 at 2:49 PM, Mix Nin wrote: Hi, I executed below PIG commands. X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y=foreach data { generate STRSPLIT(line,',') ;}; And I get below error. What is wrong in my script. I tried removing flower braces. giving extra spaces. But nothing worked 2013-03-05 15:38:57,124 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered PATH Y=foreach at line 2, column 1. Was expecting one of: EOF cat ... fs ... cd ... cp ... copyFromLocal ... copyToLocal ... dump ... describe ... aliases ... explain ... help ... kill ... ls ... mv ... mkdir ... pwd ... quit ... register ... rm ... rmf ... set ... illustrate ... run ... exec ... scriptDone ... ... EOL ... ; ... -- Thanks and Regards Nagamallikarjuna -- Thanks and Regards Nagamallikarjuna -- Thanks and Regards Nagamallikarjuna -- Thanks and Regards Nagamallikarjuna
Re: Is there a way to limit the number of maps produced by HBaseStorage ?
Hi Vincent, You can restrict the number of concurrent maps by setting this parameter *mapred.tasktracker.map.tasks.maximum = 1 or 2*. *Thanks Nagamallikarjuna* On Mon, Jan 21, 2013 at 7:13 PM, Mohammad Tariq donta...@gmail.com wrote: Hello Vincent, The number of map tasks for a job is primarily governed by the InputSplits and the InputFormat you are using. So setting it through a config parameter doesn't guarantee that your job would have the specified number of map tasks. However, you can give it a try by using set mapred.map.tasks=n in your PigLatin job. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Jan 21, 2013 at 6:57 PM, Vincent Barat vincent.ba...@gmail.com wrote: Hi, We are using HBaseStorage intensively to load data from tables having more than 100 regions. HBaseStorage generates 1 map par region, and our cluster having 50 map slots, it happens that our PIG scripts start 50 maps reading concurrently data from HBase. The problem is that our HBase cluster has only 10 nodes, and thus the maps overload it (5 intensive readers per node is too much to bare). So question: is there a way to say to PIG : limit the nb of maps to this maximum (ex: 10) ? If not, how can I patch the code to do this ? Thanks a lot for your help Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Jan 21, 2013 at 6:57 PM, Vincent Barat vincent.ba...@gmail.com wrote: Hi, We are using HBaseStorage intensively to load data from tables having more than 100 regions. HBaseStorage generates 1 map par region, and our cluster having 50 map slots, it happens that our PIG scripts start 50 maps reading concurrently data from HBase. The problem is that our HBase cluster has only 10 nodes, and thus the maps overload it (5 intensive readers per node is too much to bare). So question: is there a way to say to PIG : limit the nb of maps to this maximum (ex: 10) ? If not, how can I patch the code to do this ? Thanks a lot for your help -- Thanks and Regards Nagamallikarjuna
Re: [ANNOUNCE] Welcome new Apache Pig Committers Rohini Palaniswamy
Congrats Rohini.. On Thu, Nov 1, 2012 at 10:13 AM, Aniket Mokashi aniket...@gmail.com wrote: Congrats Rohini... On Mon, Oct 29, 2012 at 11:31 AM, Julien Le Dem jul...@twitter.com wrote: Congrats Rohini ! On Sun, Oct 28, 2012 at 9:42 AM, Bill Graham billgra...@gmail.com wrote: Congrats Rohini! Great news indeed. On Saturday, October 27, 2012, Jon Coveney wrote: Wonderful news! On Oct 26, 2012, at 9:51 PM, Gianmarco De Francisci Morales g...@apache.org javascript:; wrote: Congratulations Rohini! Welcome onboard :) -- Gianmarco On Fri, Oct 26, 2012 at 7:32 PM, Prasanth J buckeye.prasa...@gmail.comjavascript:; wrote: Congrats Rohini! Thanks -- Prasanth On Oct 26, 2012, at 10:21 PM, Santhosh Srinivasan santhosh_mut...@yahoo.com javascript:; wrote: Congrats Rohini! Full speed ahead now :) On Oct 26, 2012, at 4:37 PM, Daniel Dai da...@hortonworks.com javascript:; wrote: Here is another Pig committer announcement today. Please welcome Rohini Palaniswamy to be a Pig committer! Thanks, Daniel -- Sent from Gmail Mobile -- ...:::Aniket:::... Quetzalco@tl -- Thanks and Regards Nagamallikarjuna