Re: Assigning resources to individual MR jobs of a Pig script

2017-03-20 Thread Mohammad Tariq
Hi Koji,

This is exactly what I was looking for. Thank you so much for the pointer!


[image: --]

Tariq, Mohammad
[image: https://]about.me/mti
<https://about.me/mti?promo=email_sig_source=product_medium=email_sig_campaign=chrome_ext>




[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>


On Mon, Mar 20, 2017 at 8:34 PM, Koji Noguchi <
knogu...@yahoo-inc.com.invalid> wrote:

>
> It's a valid ask and I'm afraid we currently don't have that feature (in
> neither mapreduce nor tez).https://issues.apache.org/jira/browse/PIG-4424
>
> This is blocked by https://issues.apache.org/jira/browse/PIG-2597  and
> assigned to me.
>
> I need to find time to get to this.  Sorry!
> Koji
>
>
>
> On Sunday, March 19, 2017, 10:35:55 AM EDT, Mohammad Tariq <
> donta...@gmail.com> wrote:Yeah, I'm aware of that. I think my question
> wasn't clear. I intend to
> assign resources to individual MR jobs which get created during the course
> of execution of a pig script. Say, I have a script which performs a JOIN
> followed by a GROUP BY. Now I want to provide resources to my script based
> on the magnitude of computation required by these 2 separate MR jobs. For
> example, separate values for memory for JOIN and GROUP BY.
>
> As per its default behaviour Pig will use same resource values for all the
> MR jobs spawned from a single script.
>
> Hope I was able to clarify it a bit.
>
>
> [image: --]
>
> Tariq, Mohammad
> [image: https://]about.me/mti
> <https://about.me/mti?promo=email_sig_source=product;
> utm_medium=email_sig_campaign=chrome_ext>
>
>
>
>
> [image: http://]
>
> Tariq, Mohammad
> about.me/mti
> [image: http://]
> <http://about.me/mti>
>
>
> On Sun, Mar 19, 2017 at 11:38 AM, Jianfeng (Jeff) Zhang <
> jzh...@hortonworks.com> wrote:
>
> >
> > In that case, you can use keyword parallel to control the number of
> > reducer task of each mr job. The number of mapper task is determined by
> > the InputFormat.
> >
> > See.
> > http://pig.apache.org/docs/r0.16.0/basic.html
> >
> >
> > Best Regard,
> > Jeff Zhang
> >
> >
> >
> >
> >
> > On 3/18/17, 6:03 PM, "Mohammad Tariq" <donta...@gmail.com> wrote:
> >
> > >Hi Jeff,
> > >
> > >Thank you for the prompt response. However, I can't use Tez because of
> > >some
> > >reasons.
> > >
> > >
> > >[image: --]
> > >
> > >Tariq, Mohammad
> > >[image: https://]about.me/mti
> > ><https://about.me/mti?promo=email_sig_source=product;
> > utm_medium=email_
> > >sig_campaign=chrome_ext>
> > >
> > >
> > >
> > >
> > >[image: http://]
> > >
> > >Tariq, Mohammad
> > >about.me/mti
> > >[image: http://]
> > ><http://about.me/mti>
> > >
> > >
> > >On Fri, Mar 17, 2017 at 2:24 PM, Jianfeng (Jeff) Zhang <
> > >jzh...@hortonworks.com> wrote:
> > >
> > >>
> > >> I would suggest you to use tez which just launch one yarn app for one
> > >>pig
> > >> script.
> > >>
> > >> http://pig.apache.org/docs/r0.16.0/perf.html#enable-tez
> > >>
> > >>
> > >>
> > >>
> > >> Best Regard,
> > >> Jeff Zhang
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On 3/17/17, 3:24 AM, "Mohammad Tariq" <donta...@gmail.com> wrote:
> > >>
> > >> >Hi group,
> > >> >
> > >> >In any real world pig script we end up with multiple MR jobs(well,
> > >>most of
> > >> >the times). I was wondering if it's possible to allocate resources to
> > >> >individual MR jobs rather than assigning them at the script level
> > >>itself.
> > >> >Tried looking at multiple places. Would really appreciate some
> pointers
> > >> >regarding the same.
> > >> >
> > >> >Thank you so much for your valuable time!
> > >> >
> > >> >
> > >> >[image: http://]
> > >> >
> > >> >Tariq, Mohammad
> > >> >about.me/mti
> > >> >[image: http://]
> > >> ><http://about.me/mti>
> > >> >
> > >> >
> > >> >
> > >> >[image: --]
> > >> >
> > >> >Tariq, Mohammad
> > >> >[image: https://]about.me/mti
> > >> ><https://about.me/mti?promo=email_sig_source=product;
> > >> utm_medium=email_
> > >> >sig_campaign=chrome_ext>
> > >>
> > >>
> >
> >
>


Re: Assigning resources to individual MR jobs of a Pig script

2017-03-19 Thread Mohammad Tariq
Yeah, I'm aware of that. I think my question wasn't clear. I intend to
assign resources to individual MR jobs which get created during the course
of execution of a pig script. Say, I have a script which performs a JOIN
followed by a GROUP BY. Now I want to provide resources to my script based
on the magnitude of computation required by these 2 separate MR jobs. For
example, separate values for memory for JOIN and GROUP BY.

As per its default behaviour Pig will use same resource values for all the
MR jobs spawned from a single script.

Hope I was able to clarify it a bit.


[image: --]

Tariq, Mohammad
[image: https://]about.me/mti
<https://about.me/mti?promo=email_sig_source=product_medium=email_sig_campaign=chrome_ext>




[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>


On Sun, Mar 19, 2017 at 11:38 AM, Jianfeng (Jeff) Zhang <
jzh...@hortonworks.com> wrote:

>
> In that case, you can use keyword parallel to control the number of
> reducer task of each mr job. The number of mapper task is determined by
> the InputFormat.
>
> See.
> http://pig.apache.org/docs/r0.16.0/basic.html
>
>
> Best Regard,
> Jeff Zhang
>
>
>
>
>
> On 3/18/17, 6:03 PM, "Mohammad Tariq" <donta...@gmail.com> wrote:
>
> >Hi Jeff,
> >
> >Thank you for the prompt response. However, I can't use Tez because of
> >some
> >reasons.
> >
> >
> >[image: --]
> >
> >Tariq, Mohammad
> >[image: https://]about.me/mti
> ><https://about.me/mti?promo=email_sig_source=product;
> utm_medium=email_
> >sig_campaign=chrome_ext>
> >
> >
> >
> >
> >[image: http://]
> >
> >Tariq, Mohammad
> >about.me/mti
> >[image: http://]
> ><http://about.me/mti>
> >
> >
> >On Fri, Mar 17, 2017 at 2:24 PM, Jianfeng (Jeff) Zhang <
> >jzh...@hortonworks.com> wrote:
> >
> >>
> >> I would suggest you to use tez which just launch one yarn app for one
> >>pig
> >> script.
> >>
> >> http://pig.apache.org/docs/r0.16.0/perf.html#enable-tez
> >>
> >>
> >>
> >>
> >> Best Regard,
> >> Jeff Zhang
> >>
> >>
> >>
> >>
> >>
> >> On 3/17/17, 3:24 AM, "Mohammad Tariq" <donta...@gmail.com> wrote:
> >>
> >> >Hi group,
> >> >
> >> >In any real world pig script we end up with multiple MR jobs(well,
> >>most of
> >> >the times). I was wondering if it's possible to allocate resources to
> >> >individual MR jobs rather than assigning them at the script level
> >>itself.
> >> >Tried looking at multiple places. Would really appreciate some pointers
> >> >regarding the same.
> >> >
> >> >Thank you so much for your valuable time!
> >> >
> >> >
> >> >[image: http://]
> >> >
> >> >Tariq, Mohammad
> >> >about.me/mti
> >> >[image: http://]
> >> ><http://about.me/mti>
> >> >
> >> >
> >> >
> >> >[image: --]
> >> >
> >> >Tariq, Mohammad
> >> >[image: https://]about.me/mti
> >> ><https://about.me/mti?promo=email_sig_source=product;
> >> utm_medium=email_
> >> >sig_campaign=chrome_ext>
> >>
> >>
>
>


Re: Assigning resources to individual MR jobs of a Pig script

2017-03-18 Thread Mohammad Tariq
Hi Jeff,

Thank you for the prompt response. However, I can't use Tez because of some
reasons.


[image: --]

Tariq, Mohammad
[image: https://]about.me/mti
<https://about.me/mti?promo=email_sig_source=product_medium=email_sig_campaign=chrome_ext>




[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>


On Fri, Mar 17, 2017 at 2:24 PM, Jianfeng (Jeff) Zhang <
jzh...@hortonworks.com> wrote:

>
> I would suggest you to use tez which just launch one yarn app for one pig
> script.
>
> http://pig.apache.org/docs/r0.16.0/perf.html#enable-tez
>
>
>
>
> Best Regard,
> Jeff Zhang
>
>
>
>
>
> On 3/17/17, 3:24 AM, "Mohammad Tariq" <donta...@gmail.com> wrote:
>
> >Hi group,
> >
> >In any real world pig script we end up with multiple MR jobs(well, most of
> >the times). I was wondering if it's possible to allocate resources to
> >individual MR jobs rather than assigning them at the script level itself.
> >Tried looking at multiple places. Would really appreciate some pointers
> >regarding the same.
> >
> >Thank you so much for your valuable time!
> >
> >
> >[image: http://]
> >
> >Tariq, Mohammad
> >about.me/mti
> >[image: http://]
> ><http://about.me/mti>
> >
> >
> >
> >[image: --]
> >
> >Tariq, Mohammad
> >[image: https://]about.me/mti
> ><https://about.me/mti?promo=email_sig_source=product;
> utm_medium=email_
> >sig_campaign=chrome_ext>
>
>


Assigning resources to individual MR jobs of a Pig script

2017-03-16 Thread Mohammad Tariq
Hi group,

In any real world pig script we end up with multiple MR jobs(well, most of
the times). I was wondering if it's possible to allocate resources to
individual MR jobs rather than assigning them at the script level itself.
Tried looking at multiple places. Would really appreciate some pointers
regarding the same.

Thank you so much for your valuable time!


[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]




[image: --]

Tariq, Mohammad
[image: https://]about.me/mti



Re: Error in Pig Exection

2016-06-22 Thread Mohammad Tariq
Hi Kiran,

Does hdfs://localhost:9000/user/hduser/pig/planet.osm exist on your HDFS?



[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]



On Wed, Jun 22, 2016 at 1:08 PM, Kirandeep Kaur 
wrote:

> Dear Sir
>
> I am working on pig with hadoop as a researcher. I have some problem,
> following is the code and it is not able to read the input file from hdfs
> in hadoop.
>
> grunt> x_nodes = LOAD 'hdfs://localhost:50075/user/hduser/pig/planet.osm'
> USING org.apache.pig.piggybank.storage.XMLLoader('node') AS
> (node:chararray); grunt> p_nodes = FOREACH x_nodes GENERATE OSMNode(node)
> as node; grunt> p_nodes = FILTER p_nodes BY
> ST_Contains(ST_MakeBox(75.48,30.61,76.30,31.14), ST_MakePoint(node.lon,
> node.lat)); grunt> STORE p_nodes INTO 'p_nodes';
>
> HadoopVersion PigVersion UserId StartedAt FinishedAt Features 1.2.1 0.15.0
> hduser 2016-06-22 12:38:13 2016-06-22 12:38:15 UNKNOWN
>
> Failed!
>
> Failed Jobs: JobId Alias Feature Message Outputs N/A
> macro_LoadOSMNodes_osm_nodes_3,macro_LoadOSMNodes_xml_nodes_3,p_nodes
> MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException:
> ERROR 2118: Input path does not exist:
> hdfs://localhost:9000/user/hduser/pig/planet.osm at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
> at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
> at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071) at
> org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at
> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at
> org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at
> org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at
>
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498) at
>
> org.apache.pig.backend.hadoop20.PigJobControl.mainLoopAction(PigJobControl.java:157)
> at
> org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:134)
> at java.lang.Thread.run(Thread.java:745) at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
> Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
> Input path does not exist: hdfs://localhost:9000/user/hduser/pig/planet.osm
> at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
> at
>
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
> ... 20 more hdfs://localhost:9000/user/hduser/p_nodes-out,
>
> Input(s): Failed to read data from
> "hdfs://localhost:9000/user/hduser/pig/planet.osm"
>
> Output(s): Failed to produce result in
> "hdfs://localhost:9000/user/hduser/p_nodes-out"
>
> Counters: Total records written : 0 Total bytes written : 0 Spillable
> Memory Manager spill count : 0 Total bags proactively spilled: 0 Total
> records proactively spilled: 0
>
> Job DAG: null
>
> 2016-06-22 12:38:15,105 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> Please provide solution where is wrong declaration is used asap.
>
> Thanks
>
> Kiran
>


Re: store to defined filename

2014-05-16 Thread Mohammad Tariq
Hi there,

You could do that with the help of
MultipleOutputFormathttp://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.htmlclass.
It extends FileOutputFormat,and allows us to write the output data
to different output files.

*Warm regards,*
*Mohammad Tariq*
*cloudfront.blogspot.com http://cloudfront.blogspot.com*


On Fri, May 16, 2014 at 2:46 AM, Raviteja Chirala raviteja2...@gmail.comwrote:

 You can either do Hadoop mv if its a wrapper script or

 do getMerge to merge and rename all part files to single part file.

 On May 14, 2014, at 2:11 AM, Patcharee Thongtra patcharee.thong...@uni.no
 wrote:

  Hi,
 
  Is it possible to store results in to a file with determined filename,
 instead of part-r-0? How to do that?
 
  Patcharee




Re: Bulk load in hbase using pig

2014-02-26 Thread Mohammad Tariq
Could you please let us know how exactly you want to parse your logs?

Warm Regards,
Tariq
cloudfront.blogspot.com


On Wed, Feb 26, 2014 at 6:25 PM, David McNelis dmcne...@gmail.com wrote:

 The big question is how the log file needs to be parsed / formatting.  I'd
 be inclined to write a UDF that would take the line of text and return a
 tuple of the values you'd be storing in hbase.

 Then you could do other operations on the bag of tuples that get passed
 back.

 Alternatively, you could write a regex statement and use an internal pig
 function like REGEX_EXTRACT or REGEX_EXTRACT_ALL.

 I like the UDF approach in this case because then I can more easily write
 unit tests around my log parser and get that testing out of the way before
 actually spawning any jobs.


 On Wed, Feb 26, 2014 at 12:22 AM, Chhaya Vishwakarma 
 chhaya.vishwaka...@lntinfotech.com wrote:

  hi,
 
  I have a log file in HDFS which needs to be parsed and put in a Hbase
  table.
 
  I want to do this using PIG .
 
  How can i go about it .Pig script should parse the logs and then put in
  Hbase?
 
 
  Regards,
  Chhaya Vishwakarma
 
 
  
  The contents of this e-mail and any attachment(s) may contain
 confidential
  or privileged information for the intended recipient(s). Unintended
  recipients are prohibited from taking action on the basis of information
 in
  this e-mail and using or disseminating the information, and must notify
 the
  sender and delete it from their system. LT Infotech will not accept
  responsibility or liability for the accuracy or completeness of, or the
  presence of any virus or disabling code in this e-mail
 



Re: Reading Kafka directly from Pig?

2013-08-29 Thread Mohammad Tariq
Great job. +1

Warm Regards,
Tariq
cloudfront.blogspot.com


On Wed, Aug 7, 2013 at 8:27 PM, Russell Jurney russell.jur...@gmail.comwrote:

 Cool stuff, a Pig Kafka UDF.

 Russell Jurney http://datasyndrome.com

 Begin forwarded message:

 *From:* David Arthur mum...@gmail.com
 *Date:* August 7, 2013, 7:41:30 AM PDT
 *To:* us...@kafka.apache.org
 *Subject:* *Reading Kafka directly from Pig?*
 *Reply-To:* us...@kafka.apache.org

 I've thrown together a Pig LoadFunc to read data from Kafka, so you could
 load data like:

 QUERY_LOGS = load 'kafka://localhost:9092/logs.query#8' using
 com.mycompany.pig.KafkaAvroLoader('com.mycompany.Query');

 The path part of the uri is the Kafka topic, and the fragment is the number
 of partitions. In the implementation I have, it makes one input split per
 partition. Offsets are not really dealt with at this point - it's a rough
 prototype.

 Anyone have thoughts on whether or not this is a good idea? I know usually
 the pattern is: kafka - hdfs - mapreduce. If I'm only reading from this
 data from Kafka once, is there any reason why I can't skip writing to HDFS?

 Thanks!
 -David



Re: union

2013-07-25 Thread Mohammad Tariq
Hello Keren,

There is nothing wrong in this. One dataset in Hadoop is usually one folder
and not one file. Pig is doing what it is supposed to do and performing a
union on both the files. You would have seen the content of both the files
together while doing dump C.

Since this is a map only job, and 2 mappers are getting generated, you are
getting 2 separate files. Which is actually one complete dataset. If you
want to have just one file, you need to force a reduce so that you get all
the results collectively in a single output file.

HTH

Warm Regards,
Tariq
cloudfront.blogspot.com


On Thu, Jul 25, 2013 at 11:31 AM, Keren Ouaknine ker...@gmail.com wrote:

 Hi,

 According to Pig's documention on union, two schemas which have the same
 schema (have the same length and  types can be implicitly cast) can be
 concatenated (see http://pig.apache.org/docs/r0.11.1/basic.html#union)

 However, when I try with:
 A = load '1.txt'  using PigStorage(' ')  as (x:int, y:chararray,
 z:chararray);
 B = load '1_ext.txt'  using PigStorage(' ')  as (a:int, b:chararray,
 c:chararray);
 C = union A, B;
 describe C;
 DUMP C;
 store C into '/home/kereno/Documents/pig-0.11.1/workspace/res';

 with:
 ~/Documents/pig-0.11.1/workspace 130$ more 1.txt 1_ext.txt
 ::
 1.txt
 ::
 1 a aleph
 2 b bet
 3 g gimel
 ::
 1_ext.txt
 ::
 0 a alpha
 0 b beta
 0 g gimel


 I get in result:~/Documents/pig-0.11.1/workspace 0$ more res/part-m-*
 ::
 res/part-m-0
 ::
 0 a alpha
 0 b beta
 0 g gimel
  ::
 res/part-m-1
 ::
 1 a aleph
 2 b bet
 3 g gimel

 Whereas I was expecting something like
 0 a alpha
 0 b beta
 0 g gimel
 1 a aleph
 2 b bet
 3 g gimel

 [all together]

 I understand that two files for non-matching schemas would be generated but
 why for union with a matching schema?

 Thanks,
 Keren

 --
 Keren Ouaknine
 Web: www.kereno.com



Re: union

2013-07-25 Thread Mohammad Tariq
You could try something like this :

A = load '/1.txt' using PigStorage(' ') as (x:int, y:chararray,
z:chararray);

B = load '/1_ext.txt' using PigStorage(' ') as (a:int, b:chararray,
c:chararray);

C = union A, B;

D = group C by 1;

E = foreach D generate flatten(C);

store E into '/dir';

Warm Regards,
Tariq
cloudfront.blogspot.com


On Thu, Jul 25, 2013 at 12:52 PM, Mohammad Tariq donta...@gmail.com wrote:

 Hello Keren,

 There is nothing wrong in this. One dataset in Hadoop is usually one
 folder and not one file. Pig is doing what it is supposed to do and
 performing a union on both the files. You would have seen the content of
 both the files together while doing dump C.

 Since this is a map only job, and 2 mappers are getting generated, you are
 getting 2 separate files. Which is actually one complete dataset. If you
 want to have just one file, you need to force a reduce so that you get all
 the results collectively in a single output file.

 HTH

 Warm Regards,
 Tariq
 cloudfront.blogspot.com


 On Thu, Jul 25, 2013 at 11:31 AM, Keren Ouaknine ker...@gmail.com wrote:

 Hi,

 According to Pig's documention on union, two schemas which have the same
 schema (have the same length and  types can be implicitly cast) can be
 concatenated (see http://pig.apache.org/docs/r0.11.1/basic.html#union)

 However, when I try with:
 A = load '1.txt'  using PigStorage(' ')  as (x:int, y:chararray,
 z:chararray);
 B = load '1_ext.txt'  using PigStorage(' ')  as (a:int, b:chararray,
 c:chararray);
 C = union A, B;
 describe C;
 DUMP C;
 store C into '/home/kereno/Documents/pig-0.11.1/workspace/res';

 with:
 ~/Documents/pig-0.11.1/workspace 130$ more 1.txt 1_ext.txt
 ::
 1.txt
 ::
 1 a aleph
 2 b bet
 3 g gimel
 ::
 1_ext.txt
 ::
 0 a alpha
 0 b beta
 0 g gimel


 I get in result:~/Documents/pig-0.11.1/workspace 0$ more res/part-m-*
 ::
 res/part-m-0
 ::
 0 a alpha
 0 b beta
 0 g gimel
  ::
 res/part-m-1
 ::
 1 a aleph
 2 b bet
 3 g gimel

 Whereas I was expecting something like
 0 a alpha
 0 b beta
 0 g gimel
 1 a aleph
 2 b bet
 3 g gimel

 [all together]

 I understand that two files for non-matching schemas would be generated
 but
 why for union with a matching schema?

 Thanks,
 Keren

 --
 Keren Ouaknine
 Web: www.kereno.com





Pig giving priority to non Apache Hadoop

2013-06-25 Thread Mohammad Tariq
Hello list,

 Today I started Pig on my personal machine after a few weeks to
give 0.11.1 a try. As soon as I issued bin/pig it threw this message on my
terminal :

apache@hadoop:/hadoop/projects/pig-0.11.1$ bin/pig
2013-06-26 06:05:45,121 [main] INFO  org.apache.pig.Main - Apache Pig
version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-06-26 06:05:45,122 [main] INFO  org.apache.pig.Main - Logging error
messages to: /hadoop/pig/logs/pig_1372206945120.log
2013-06-26 06:05:45,143 [main] INFO  org.apache.pig.impl.util.Utils -
Default bootup file /home/apache/.pigbootup not found
2013-06-26 06:05:45,256 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: maprfs:///
2013-06-26 06:05:45,326 [main] INFO
 org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2013-06-26 06:05:45,327 [main] INFO
 org.apache.hadoop.security.JniBasedUnixGroupsMapping - Using
JniBasedUnixGroupsMapping for Group resolution
2013-06-26 06:05:45,385 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to map-reduce job tracker at: maprfs:///

I was quite surprised at the first look, but then realized that I was
checking out MapR's M3 on my machine a few days ago. Is is normal?I mean
Apache Pig is going to maprfs instead of hdfs despite the fact that all
my Apache Hadoop daemons were running. Although I removed M3 immediately,
i'm still curious. I would really appreciate if somebody could shed some
light.

Thank you so much for your time.

Warm Regards,
Tariq
cloudfront.blogspot.com


Re: BinCond

2013-04-27 Thread Mohammad Tariq
Hello Soniya,

   It's like the ternary or the conditional operator available in java
and works just like that. Here is the example provided in the reference
manual :

*Suppose we have relation A.*

A = LOAD 'data' AS (f1:int, f2:int, B:bag{T:tuple(t1:int,t2:int)});

DUMP A;
(10,1,{(2,3),(4,6)})
(10,3,{(2,3),(4,6)})
(10,6,{(2,3),(4,6),(5,7)})

*In this example the modulo operator is used with fields f1 and f2.*

X = FOREACH A GENERATE f1, f2, f1%f2;

DUMP X;
(10,1,0)
(10,3,1)
(10,6,4)

*In this example the bincond operator is used with fields f2 and B. The
condition is f2 equals 1; if the condition is true, return 1; if the
condition is false, return the count of the number of tuples in B.*

X = FOREACH A GENERATE f2, (f2==1?1:COUNT(B));

DUMP X;
(1,1L)
(3,2L)
(6,3L)


It clearly shows that when f2==1, which is the first case the exp
evaluates to true, hence 1 is returned and count(B) is returned in rest
of the 2 cases as exp evaluates to false.

What were you trying to do and what exactly is the problem which you are
facing?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Sat, Apr 27, 2013 at 8:23 PM, soniya B soniya.bigd...@gmail.com wrote:

 Hi,

 Anyone can explain me about use of BinCond function with an example?  I am
 trying a lot but didn't work it.

 Regards
 Soniya



Re: Coding standards of Pig

2013-04-21 Thread Mohammad Tariq
Hello Raj,

  You might find these links useful :
http://wiki.apache.org/pig/PigPerformance
http://wiki.apache.org/pig/PigPerformance

And I don't have much idea as far as coding conventions are concerned as I
haven't seen much on that. I had come across this small section on the
reference manual though. You can visit it here :
http://pig.apache.org/docs/r0.8.1/piglatin_ref2.html#Conventions

HTH

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Sun, Apr 21, 2013 at 7:23 PM, Raj hadoop raj.had...@gmail.com wrote:

 Hi,

 We are new to hadoop family(Pig,Hive). We started a project on Pig, We are
 set to define some coding standards as well performance benchmarking
 activities so kindly help us with any specific doc you have which help us a
 lot.



Re: Sequence File processing

2012-12-24 Thread Mohammad Tariq
+1

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Tue, Dec 25, 2012 at 3:07 AM, Cheolsoo Park cheol...@cloudera.comwrote:

 Hi Srini,

 You can use STRSPLIT to split your value chararray and define schema in a
 FOREACH. For example, if the value consists of 3 integers (i.e. 1|2|3),

 A= LOAD 'part-m-' USING SequenceFileLoader() AS
 (key:long,value:chararray);
 B = FOREACH A GENERATE key, FLATTEN( STRSPLIT(value,'\\|') ) AS (i:int,
 j:int, k:int);
 DESCRIBE B;
 DUMP B;

 This will return:

 B: {key: chararray,i: int,j: int,k: int}
 (k,1,2,3)

 Thanks,
 Cheolsoo


 On Sun, Dec 23, 2012 at 9:24 PM, Srini piglearn...@gmail.com wrote:

  Hi ,
 
  I have used SequeceFileLoader for loading sequence file.
 
  A= load 'part-m-' using SequenceFileLoader() as
  (key:long,value:chararray)
 
  value is the  chararray which consists of 10 fields which are separated
  by delimiter ( | here ). How do I create schema here so that I can make
  further analysis with these fields (such as filter, group )
 
  Any help is appreciated.
 
  Thanks,
  Srini
 



Re: Limit number of Streaming Programs

2012-12-24 Thread Mohammad Tariq
Folks on the list need some time mate. I have specified a couple of links
on the other thread of yours. Check it out and see if it helps.

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/


On Tue, Dec 25, 2012 at 11:09 AM, Kshiva Kps kshiva...@gmail.com wrote:

 Hi,

 Is there any PIG editors and where we can write 100 to 150 pig scripts
 I'm believing is not possible to  do in CLI mode .
 Like IDE for JAVA /TOAD for SQL pls advice , many thanks

 Thnaks


 On Tue, Dec 25, 2012 at 3:45 AM, Cheolsoo Park cheol...@cloudera.com
 wrote:

  Hi Thomas,
 
  If I understand your question correctly, what you want is reduce the
 number
  of mappers that spawn streaming processes. The default-parallel controls
  the number of reducers, so it won't have any effect to the number of
  mappers. Although the number of mappers is auto-determined by the size of
  input data, you can try to set pig.maxCombinedSplitSize to combine
 input
  files into bigger ones. For more details, please refer to:
  http://pig.apache.org/docs/r0.10.0/perf.html#combine-files
 
  You can also read a discussion on a similar topic here:
 
 
 http://search-hadoop.com/m/J5hCw1UdxTa/How+can+I+set+the+mapper+numbersubj=How+can+I+set+the+mapper+number+for+pig+script+
 
  Thanks,
  Cheolsoo
 
 
  On Tue, Dec 18, 2012 at 12:00 PM, Thomas Bach
  thb...@students.uni-mainz.dewrote:
 
   Hi,
  
   I have around 4 million time series. ~1000 of them had a special
   occurrence at some point. Now, I want to draw 10 samples for each
   special time-series based on a similarity comparison.
  
   What I have currently implemented is a script in Python which consumes
   time-series one-by-one and does a comparison with all 1000 special
   time-series. If the similarity is sufficient with one of them I pass
   it back to Pig and strike out the according special time-series,
   subsequent time-series will not be compared against this one.
  
   This routine runs, but it lasts around 6 hours.
  
   One of the problems I'm facing is that Pig starts 160 scripts
   although 10 would be sufficient. Is there some way to define the
   number of scripts Pig starts in a `STREAM THROUGH` step? I tried to
   set default_parallel to 10, but it doesn't seem to have any effect.
  
   I'm also open to any other ideas on how to accomplish the task.
  
   Regards,
   Thomas Bach.
  
 



Re: Is Programming Pig book outdated?

2012-11-16 Thread Mohammad Tariq
Agree with Mr. Jagat.

Regards,
Mohammad Tariq



On Fri, Nov 16, 2012 at 3:26 PM, Jagat Singh jagatsi...@gmail.com wrote:

 In open source community no book can be ever latest , so we have to live by
 this :)

 I would suggest you to start from this book and see the latest
 documentation on pig website.side by side to see latest features

 Good luck


 On Fri, Nov 16, 2012 at 8:41 PM, Majid Azimi majid.merk...@gmail.com
 wrote:

  Hi guys,
 
  this is not really a question. I want to know is this book
  (Programming Pig)
  http://www.amazon.com/Programming-Pig-Alan-Gates/dp/1449302645/
 outdated?
  The book says it is based on 0.8 with some additions from
  0.9(because at time of releasing book 0.9 has not been released). Now Pig
  is in version 1.0. Is there any massive changes in these versions? Can
 this
  book be a good resource for learning Pig?
 



Re: accessing like array

2012-11-06 Thread Mohammad Tariq
load the data into a relation and use 'generate' to take only the required
fields from this relation and put into some other relation..then store the
2nd relation into some file.

Regards,
Mohammad Tariq



On Tue, Nov 6, 2012 at 7:43 PM, jamal sasha jamalsha...@gmail.com wrote:

 Hi,
   I have data in form
 1,0.2,0.3
 1,0.3,0.4
 2,0.8,0.2
 2,0.9,0.7
 and so on..
 so id, va1,val2 format..

 This id is already sorted based on val 2
 I want to select the 2nd element for each id with val2 (ignoring val1)
 for example in the above dataset what i want to return is
 1,0.4
 2,0.7

 How to go about this??
 Thanks



Unable to store data into HBase

2012-09-03 Thread Mohammad Tariq
Hello list,

   I have a file in my Hdfs and I am reading this file and trying to
store the data into an HBase table through Pig Shell. Here are the commands
I am using :i

z = load '/mapin/testdata2.csv/part-m-0' using PigStorage(',') as
(rowkey:int, id:int, age:float, gender:chararray, height:int, size:int,
color:chararray);
store z into 'hbase://csvdata' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:id, cf:age, cf:gender,
cf:height, cf:size, cf:color');

Although, I can see the data when I dump the relation 'z', but I am not
able to store 'z' in HBase using the above specified command. I am getting
the following error :

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.3 0.10.0 cluster 2012-09-03 12:40:31 2012-09-03 12:41:04 UNKNOWN

Failed!

Failed Jobs:
JobId Alias Feature Message Outputs
job_201209031122_0009 z MAP_ONLY Message: Job failed! Error - JobCleanup
Task Failure, Task: task_201209031122_0009_m_01 csvdata,

Input(s):
Failed to read data from /mapin/testdata2.csv/part-m-0

Output(s):
Failed to produce result in csvdata

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201209031122_0009


2012-09-03 12:41:04,606 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2012-09-03 12:41:04,629 [main] INFO
 org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
family:descriptor filters with values cf:id
2012-09-03 12:41:04,629 [main] INFO
 org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
family:descriptor filters with values cf:age
2012-09-03 12:41:04,629 [main] INFO
 org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
family:descriptor filters with values cf:gender
2012-09-03 12:41:04,629 [main] INFO
 org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
family:descriptor filters with values cf:height
2012-09-03 12:41:04,629 [main] INFO
 org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
family:descriptor filters with values cf:size
2012-09-03 12:41:04,629 [main] INFO
 org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
family:descriptor filters with values cf:color

I am not getting why it shows Failed to read data from
/mapin/testdata2.csv/part-m-0, when I already have data in relation
'z'. Any help would be much appreciated. Many thanks.

Regards,
Mohammad Tariq


Re: Unable to store data into HBase

2012-09-03 Thread Mohammad Tariq
I don't think there is any problem with that as I am able to execute other
queries, like loading data from an HBase table and storing it into another
HBase table.

Regards,
Mohammad Tariq



On Mon, Sep 3, 2012 at 1:57 PM, shashwat shriparv dwivedishash...@gmail.com
 wrote:

 What can conclude from the error is that PIG is not able to run in
 distributed mode as its not able to connect to Hadoop. just check out if
 other map reduce tasks in Pig is working fine. Either pig is searching the
 file which is not present, check where pig is searching the file its
 there..

 Regards

 ∞
 Shashwat Shriparv



 On Mon, Sep 3, 2012 at 1:00 PM, Mohammad Tariq donta...@gmail.com wrote:

  Hello list,
 
 I have a file in my Hdfs and I am reading this file and trying to
  store the data into an HBase table through Pig Shell. Here are the
 commands
  I am using :i
 
  z = load '/mapin/testdata2.csv/part-m-0' using PigStorage(',') as
  (rowkey:int, id:int, age:float, gender:chararray, height:int, size:int,
  color:chararray);
  store z into 'hbase://csvdata' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:id, cf:age,
 cf:gender,
  cf:height, cf:size, cf:color');
 
  Although, I can see the data when I dump the relation 'z', but I am not
  able to store 'z' in HBase using the above specified command. I am
 getting
  the following error :
 
  HadoopVersion PigVersion UserId StartedAt FinishedAt Features
  1.0.3 0.10.0 cluster 2012-09-03 12:40:31 2012-09-03 12:41:04 UNKNOWN
 
  Failed!
 
  Failed Jobs:
  JobId Alias Feature Message Outputs
  job_201209031122_0009 z MAP_ONLY Message: Job failed! Error - JobCleanup
  Task Failure, Task: task_201209031122_0009_m_01 csvdata,
 
  Input(s):
  Failed to read data from /mapin/testdata2.csv/part-m-0
 
  Output(s):
  Failed to produce result in csvdata
 
  Counters:
  Total records written : 0
  Total bytes written : 0
  Spillable Memory Manager spill count : 0
  Total bags proactively spilled: 0
  Total records proactively spilled: 0
 
  Job DAG:
  job_201209031122_0009
 
 
  2012-09-03 12:41:04,606 [main] INFO
 
 
  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
  2012-09-03 12:41:04,629 [main] INFO
   org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
  family:descriptor filters with values cf:id
  2012-09-03 12:41:04,629 [main] INFO
   org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
  family:descriptor filters with values cf:age
  2012-09-03 12:41:04,629 [main] INFO
   org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
  family:descriptor filters with values cf:gender
  2012-09-03 12:41:04,629 [main] INFO
   org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
  family:descriptor filters with values cf:height
  2012-09-03 12:41:04,629 [main] INFO
   org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
  family:descriptor filters with values cf:size
  2012-09-03 12:41:04,629 [main] INFO
   org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
  family:descriptor filters with values cf:color
 
  I am not getting why it shows Failed to read data from
  /mapin/testdata2.csv/part-m-0, when I already have data in relation
  'z'. Any help would be much appreciated. Many thanks.
 
  Regards,
  Mohammad Tariq
 



 --


 ∞
 Shashwat Shriparv



Re: Unable to store data into HBase

2012-09-03 Thread Mohammad Tariq
Thank you for the response. But even after removing the comma it's not
working. I have noticed 2 strange things here :
1- If I am reading data from HBase and putting it back in some HBase table
it works fine.
2- When I am trying the same thing using older versions, HBase(0.90.4) and
Pig(0.9.1), it is working perfectly fine.

It seems there is some compatibility issue between Pig(0.10.0) and
HBase(0.92.1). Any comments or suggestions?

Regards,
Mohammad Tariq



On Mon, Sep 3, 2012 at 8:07 PM, chethan chethan...@gmail.com wrote:

 STORE raw_data INTO ‘hbase://sample_names’ USING
 org.apache.pig.backend.hadoop.hbase.HBaseStorage (
 ‘info:fname info:lname’);

 As above is the example of the HBaseStorage,

 1. it take the column family and value( internally it is separated by
 space as u have given comma for   separation this might be creating
 the problem),

 store z into 'hbase://csvdata' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:id, cf:age,
 cf:gender, cf:height, cf:size, cf:color');

 On Mon, Sep 3, 2012 at 2:34 PM, Mohammad Tariq donta...@gmail.com wrote:
  I don't think there is any problem with that as I am able to execute
 other
  queries, like loading data from an HBase table and storing it into
 another
  HBase table.
 
  Regards,
  Mohammad Tariq
 
 
 
  On Mon, Sep 3, 2012 at 1:57 PM, shashwat shriparv 
 dwivedishash...@gmail.com
  wrote:
 
  What can conclude from the error is that PIG is not able to run in
  distributed mode as its not able to connect to Hadoop. just check out if
  other map reduce tasks in Pig is working fine. Either pig is searching
 the
  file which is not present, check where pig is searching the file its
  there..
 
  Regards
 
  ∞
  Shashwat Shriparv
 
 
 
  On Mon, Sep 3, 2012 at 1:00 PM, Mohammad Tariq donta...@gmail.com
 wrote:
 
   Hello list,
  
  I have a file in my Hdfs and I am reading this file and trying
 to
   store the data into an HBase table through Pig Shell. Here are the
  commands
   I am using :i
  
   z = load '/mapin/testdata2.csv/part-m-0' using PigStorage(',') as
   (rowkey:int, id:int, age:float, gender:chararray, height:int,
 size:int,
   color:chararray);
   store z into 'hbase://csvdata' USING
   org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:id, cf:age,
  cf:gender,
   cf:height, cf:size, cf:color');
  
   Although, I can see the data when I dump the relation 'z', but I am
 not
   able to store 'z' in HBase using the above specified command. I am
  getting
   the following error :
  
   HadoopVersion PigVersion UserId StartedAt FinishedAt Features
   1.0.3 0.10.0 cluster 2012-09-03 12:40:31 2012-09-03 12:41:04 UNKNOWN
  
   Failed!
  
   Failed Jobs:
   JobId Alias Feature Message Outputs
   job_201209031122_0009 z MAP_ONLY Message: Job failed! Error -
 JobCleanup
   Task Failure, Task: task_201209031122_0009_m_01 csvdata,
  
   Input(s):
   Failed to read data from /mapin/testdata2.csv/part-m-0
  
   Output(s):
   Failed to produce result in csvdata
  
   Counters:
   Total records written : 0
   Total bytes written : 0
   Spillable Memory Manager spill count : 0
   Total bags proactively spilled: 0
   Total records proactively spilled: 0
  
   Job DAG:
   job_201209031122_0009
  
  
   2012-09-03 12:41:04,606 [main] INFO
  
  
 
  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
   - Failed!
   2012-09-03 12:41:04,629 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
   family:descriptor filters with values cf:id
   2012-09-03 12:41:04,629 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
   family:descriptor filters with values cf:age
   2012-09-03 12:41:04,629 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
   family:descriptor filters with values cf:gender
   2012-09-03 12:41:04,629 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
   family:descriptor filters with values cf:height
   2012-09-03 12:41:04,629 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
   family:descriptor filters with values cf:size
   2012-09-03 12:41:04,629 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - Adding
   family:descriptor filters with values cf:color
  
   I am not getting why it shows Failed to read data from
   /mapin/testdata2.csv/part-m-0, when I already have data in
 relation
   'z'. Any help would be much appreciated. Many thanks.
  
   Regards,
   Mohammad Tariq
  
 
 
 
  --
 
 
  ∞
  Shashwat Shriparv
 



Re: FileAlreadyExistsException while running pig

2012-08-10 Thread Mohammad Tariq
Hello Haitao,

Each time we run a MapReduce job, the job expects the output to be
non-existent. If the output path is already there then
FileAlreadyExists  exception is thrown. And as we know that each Pig
job is eventually a MapReduce job, it also expects the same.

Regards,
Mohammad Tariq


On Fri, Aug 10, 2012 at 11:18 PM, Alan Gates ga...@hortonworks.com wrote:
 Usually that means the the directory you are trying to store to already 
 exists.  Pig won't overwrite existing data.  You should either move or remove 
 the directory or change the directory name in your store function.

 Alan.

 On Aug 9, 2012, at 7:42 PM, Haitao Yao wrote:

 hi, all
   I got this while running pig script:

 997: Unable to recreate exception from backend error:
 org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
 hdfs://DC-hadoop01:9000/tmp/pig-temp/temp548500412/tmp-1456742965 already 
 exists
at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207)
at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:188)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:893)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:830)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:722)


 But I checked the script , the directory:  
 hdfs://DC-hadoop01:9000/tmp/pig-temp/temp548500412/tmp-1456742965 is not 
 used by the script explicitly, so I think it is used by the pig to store tmp 
 results.
 But why it exists? Isn't it unique?








 Haitao Yao
 yao.e...@gmail.com
 weibo: @haitao_yao
 Skype:  haitao.yao.final




Re: DATA not storing as comma-separted

2012-07-25 Thread Mohammad Tariq
Hi Yogesh,

 Is 'load' working fine with PigStorage()?? Try to load
something using PigStorage(',') and dump it to see if that is working.

Regards,
Mohammad Tariq


On Wed, Jul 25, 2012 at 4:41 PM,  yogesh.kuma...@wipro.com wrote:
 Hello Dmitriy,

 I have also performed the cat command in hadoop.

 hadoop dfs -cat  /hadoop/pig/records/part-m-0

 but still it shows same output without commas.
 Please suggest

 Thanks  regards
 Yogesh Kumar
 
 From: Dmitriy Ryaboy [dvrya...@gmail.com]
 Sent: Wednesday, July 25, 2012 4:33 PM
 To: user@pig.apache.org
 Subject: Re: DATA not storing as comma-separted

 Using the store expression you wrote should work. Dump is its own thing and 
 doesn't know anything about the format you store things in. To see files 
 created on hdfs, you can use cat.

 On Jul 25, 2012, at 3:48 AM, yogesh.kuma...@wipro.com wrote:

 Hi All,

 I am new to PIG, trying to stroe data in HDFS as comma separated by using 
 command

 store RECORDS into 'hadoop/pig/records' using PigStorage(',');

 If I do

 dump RECORDS ;

 it shows

 (YogeshKumar 210 hello)
 (Mohitkumar 211 hi)
 (AAshichoudhary 212 hii)
 (renuchoudhary 213 namestey)

 I want it to store as

 (YogeshKumar, 210, hello)
 (Mohitkumar, 211,hi)
 (AAshichoudhary, 212, hii)
 (renuchoudhary, 213, namestey)


 Please suggest and Help

 Thanks  Regards
 Yogesh Kumar




 Please do not print this email unless it is absolutely necessary.

 The information contained in this electronic message and any attachments to 
 this message are intended for the exclusive use of the addressee(s) and may 
 contain proprietary, confidential or privileged information. If you are not 
 the intended recipient, you should not disseminate, distribute or copy this 
 e-mail. Please notify the sender immediately and destroy all copies of this 
 message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient should 
 check this email and any attachments for the presence of viruses. The 
 company accepts no liability for any damage caused by any virus transmitted 
 by this email.

 www.wipro.com

 Please do not print this email unless it is absolutely necessary.

 The information contained in this electronic message and any attachments to 
 this message are intended for the exclusive use of the addressee(s) and may 
 contain proprietary, confidential or privileged information. If you are not 
 the intended recipient, you should not disseminate, distribute or copy this 
 e-mail. Please notify the sender immediately and destroy all copies of this 
 message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient should 
 check this email and any attachments for the presence of viruses. The company 
 accepts no liability for any damage caused by any virus transmitted by this 
 email.

 www.wipro.com


Re: DATA not storing as comma-separted

2012-07-25 Thread Mohammad Tariq
I have worked with pig-0.7.0 once and it was working fine. Try to see
if there is anything interesting in the log files. Also, if possible,
share 2-3 lines of your file..I'll give it a try on my machine.

Regards,
Mohammad Tariq


On Wed, Jul 25, 2012 at 5:20 PM,  yogesh.kuma...@wipro.com wrote:
 Hi Mohammad,

 I have switched from pig 0.10.0 to 0.7.0 and its horrible experience.
 I do perform

 grunt A = load '/hello/demotry.txt'
 as (name:chararray, roll:int, mssg:chararray);

 grunt dump A;

 it shows this error:

 grunt dump A;
 2012-07-25 17:20:34,081 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
 for A
 2012-07-25 17:20:34,081 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
 for A
 2012-07-25 17:20:34,102 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Initializing JVM Metrics with processName=JobTracker, sessionId=
 2012-07-25 17:20:34,169 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: 
 Store(file:/tmp/temp61624047/tmp1087576502:org.apache.pig.builtin.BinStorage) 
 - 1-18 Operator Key: 1-18)
 2012-07-25 17:20:34,195 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2012-07-25 17:20:34,195 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 2012-07-25 17:20:34,211 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
 already initialized
 2012-07-25 17:20:34,217 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
 already initialized
 2012-07-25 17:20:34,217 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
 2012-07-25 17:20:35,570 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - Setting up single store job
 2012-07-25 17:20:35,599 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
 already initialized
 2012-07-25 17:20:35,600 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 1 map-reduce job(s) waiting for submission.
 2012-07-25 17:20:35,606 [Thread-7] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2012-07-25 17:20:35,750 [Thread-7] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with 
 processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:35,763 [Thread-7] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with 
 processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:36,101 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2012-07-25 17:20:36,101 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2012-07-25 17:20:36,101 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 1 map reduce job(s) failed!
 2012-07-25 17:20:36,107 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed to produce result in: file:/tmp/temp61624047/tmp1087576502
 2012-07-25 17:20:36,107 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
 2012-07-25 17:20:36,120 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
 already initialized
 2012-07-25 17:20:36,121 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2997: Unable to recreate exception from backend error: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
 create input splits for: file:///hello/demotry.txt
 Details at logfile: /users/mediaadmin/pig_1343217013235.log


 why is it happening so :-(

 Please help and Suggest

 Thanks  Regards
 yogesh Kumar



 
 From: Mohammad Tariq [donta...@gmail.com]
 Sent: Wednesday, July 25, 2012 5:00 PM
 To: user@pig.apache.org
 Subject: Re: DATA not storing as comma-separted

 Hi Yogesh,

  Is 'load' working fine with PigStorage()?? Try to load
 something using PigStorage(',') and dump it to see if that is working.

 Regards,
 Mohammad Tariq


 On Wed, Jul 25, 2012 at 4:41 PM,  yogesh.kuma...@wipro.com wrote:
 Hello Dmitriy,

 I have also performed the cat command in hadoop.

 hadoop dfs -cat  /hadoop/pig/records/part-m-0

 but still

Re: DATA not storing as comma-separted

2012-07-25 Thread Mohammad Tariq
Also, it would be help to go to the MapReduce web UI and having a look
at the details of the job corresponding to this query.

Regards,
Mohammad Tariq


On Wed, Jul 25, 2012 at 5:31 PM, Mohammad Tariq donta...@gmail.com wrote:
 I have worked with pig-0.7.0 once and it was working fine. Try to see
 if there is anything interesting in the log files. Also, if possible,
 share 2-3 lines of your file..I'll give it a try on my machine.

 Regards,
 Mohammad Tariq


 On Wed, Jul 25, 2012 at 5:20 PM,  yogesh.kuma...@wipro.com wrote:
 Hi Mohammad,

 I have switched from pig 0.10.0 to 0.7.0 and its horrible experience.
 I do perform

 grunt A = load '/hello/demotry.txt'
 as (name:chararray, roll:int, mssg:chararray);

 grunt dump A;

 it shows this error:

 grunt dump A;
 2012-07-25 17:20:34,081 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
 for A
 2012-07-25 17:20:34,081 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
 for A
 2012-07-25 17:20:34,102 [main] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with 
 processName=JobTracker, sessionId=
 2012-07-25 17:20:34,169 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: 
 Store(file:/tmp/temp61624047/tmp1087576502:org.apache.pig.builtin.BinStorage)
  - 1-18 Operator Key: 1-18)
 2012-07-25 17:20:34,195 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2012-07-25 17:20:34,195 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 2012-07-25 17:20:34,211 [main] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics 
 with processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:34,217 [main] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics 
 with processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:34,217 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
 2012-07-25 17:20:35,570 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - Setting up single store job
 2012-07-25 17:20:35,599 [main] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics 
 with processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:35,600 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 1 map-reduce job(s) waiting for submission.
 2012-07-25 17:20:35,606 [Thread-7] WARN  org.apache.hadoop.mapred.JobClient 
 - Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2012-07-25 17:20:35,750 [Thread-7] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics 
 with processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:35,763 [Thread-7] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics 
 with processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:36,101 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2012-07-25 17:20:36,101 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2012-07-25 17:20:36,101 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 1 map reduce job(s) failed!
 2012-07-25 17:20:36,107 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed to produce result in: file:/tmp/temp61624047/tmp1087576502
 2012-07-25 17:20:36,107 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
 2012-07-25 17:20:36,120 [main] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics 
 with processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:36,121 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 ERROR 2997: Unable to recreate exception from backend error: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to 
 create input splits for: file:///hello/demotry.txt
 Details at logfile: /users/mediaadmin/pig_1343217013235.log


 why is it happening so :-(

 Please help and Suggest

 Thanks  Regards
 yogesh Kumar



 
 From: Mohammad Tariq [donta...@gmail.com]
 Sent: Wednesday, July 25, 2012 5:00 PM
 To: user@pig.apache.org
 Subject: Re: DATA not storing as comma-separted

 Hi Yogesh,

  Is 'load' working fine with PigStorage()?? Try to load
 something using PigStorage(',') and dump it to see

Re: DATA not storing as comma-separted

2012-07-25 Thread Mohammad Tariq
Hello Yogesh,

   Also add these lines, export PIG_CLASSPATH=/HADOOP_HOME/conf 
export HADOOP_CONF_DIR=/HADOOP_HOME/conf, and see if it works for you.

Regards,
Mohammad Tariq


On Wed, Jul 25, 2012 at 6:01 PM,  yogesh.kuma...@wipro.com wrote:
 Hi mohammad,

 when I try the command

 Pig

 its shows error for 0.7.0 version

 mediaadmin$ pig
 12/07/25 17:54:15 INFO pig.Main: Logging error messages to: 
 /users/mediaadmin/pig_1343219055229.log
 2012-07-25 17:54:15,451 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: file:///

 and this  .log file doesn't exist /users/mediaadmin/

 Wht is it so, I have set the thses properties in pig-0.70.0/bin/pig file.

 -
  The Pig command script
 #
 # Environment Variables
 #
 export JAVA_HOME=/Library/Java/Home
 #
 # PIG_CLASSPATH Extra Java CLASSPATH entries.
 #
   export HADOOP_HOME=/HADOOP/hadoop-0.20.2

 export HADOOP_CONF_DIR=/HADOOP/hadoop-0.20.2/conf

 # PIG_HEAPSIZEThe maximum amount of heap to use, in MB.
 #Default is 1000.
 #
 # PIG_OPTSExtra Java runtime options.
 #
  export PIG_CONF_DIR=/HADOOP/pig-0.7.0/conf
 #
 # PIG_ROOT_LOGGER The root appender. Default is INFO,console
 #
 # PIG_HADOOP_VERSION Version of hadoop to run with.Default is 20 
 (0.20).

 




 
 From: Mohammad Tariq [donta...@gmail.com]
 Sent: Wednesday, July 25, 2012 5:34 PM
 To: user@pig.apache.org
 Subject: Re: DATA not storing as comma-separted

 Also, it would be help to go to the MapReduce web UI and having a look
 at the details of the job corresponding to this query.

 Regards,
 Mohammad Tariq


 On Wed, Jul 25, 2012 at 5:31 PM, Mohammad Tariq donta...@gmail.com wrote:
 I have worked with pig-0.7.0 once and it was working fine. Try to see
 if there is anything interesting in the log files. Also, if possible,
 share 2-3 lines of your file..I'll give it a try on my machine.

 Regards,
 Mohammad Tariq


 On Wed, Jul 25, 2012 at 5:20 PM,  yogesh.kuma...@wipro.com wrote:
 Hi Mohammad,

 I have switched from pig 0.10.0 to 0.7.0 and its horrible experience.
 I do perform

 grunt A = load '/hello/demotry.txt'
 as (name:chararray, roll:int, mssg:chararray);

 grunt dump A;

 it shows this error:

 grunt dump A;
 2012-07-25 17:20:34,081 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
 for A
 2012-07-25 17:20:34,081 [main] INFO  
 org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys 
 pruned for A
 2012-07-25 17:20:34,102 [main] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with 
 processName=JobTracker, sessionId=
 2012-07-25 17:20:34,169 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: 
 Store(file:/tmp/temp61624047/tmp1087576502:org.apache.pig.builtin.BinStorage)
  - 1-18 Operator Key: 1-18)
 2012-07-25 17:20:34,195 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2012-07-25 17:20:34,195 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 2012-07-25 17:20:34,211 [main] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics 
 with processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:34,217 [main] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics 
 with processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:34,217 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
 2012-07-25 17:20:35,570 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
  - Setting up single store job
 2012-07-25 17:20:35,599 [main] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics 
 with processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:35,600 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 1 map-reduce job(s) waiting for submission.
 2012-07-25 17:20:35,606 [Thread-7] WARN  org.apache.hadoop.mapred.JobClient 
 - Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2012-07-25 17:20:35,750 [Thread-7] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics 
 with processName=JobTracker, sessionId= - already initialized
 2012-07-25 17:20:35,763 [Thread-7] INFO  
 org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics

Re: foreach in PIG is not working.

2012-07-25 Thread Mohammad Tariq
Hi Yogesh,

  As per the result of dump A, it is correct. Just see that
whatever is there in A, it's one complete chararray (Yogesh 12,).
Although you are trying to load the file as (name:chararray,
roll:int), it is going as a single field ans not (name:chararray,
roll:int). Just try to load the file properly, and you are good to go.

Regards,
Mohammad Tariq


On Wed, Jul 25, 2012 at 11:36 PM,  yogesh.kuma...@wipro.com wrote:
 Hi all,

 I loaded a file to pig by command from HDFS.

 A=load '/HADOOP/Yogesh/demo.txt'
 as (name:chararray, roll:int);

 its get loaded and when i do

 dump A:

 it shows

 (Yogesh 12,)
 (Aashi 13,)
 (mohit 14,)

 Now i run another query

 B= foreach A generate name;

 to get result only names from A.

 but dump B; hows the same result as ofdump A; i.e

 (Yogesh 12,)
 (Aashi 13,)
 (mohit 14,)


 Please help and suggest.

 Thanks  Regards
 Yogesh Kumar

 Please do not print this email unless it is absolutely necessary.

 The information contained in this electronic message and any attachments to 
 this message are intended for the exclusive use of the addressee(s) and may 
 contain proprietary, confidential or privileged information. If you are not 
 the intended recipient, you should not disseminate, distribute or copy this 
 e-mail. Please notify the sender immediately and destroy all copies of this 
 message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient should 
 check this email and any attachments for the presence of viruses. The company 
 accepts no liability for any damage caused by any virus transmitted by this 
 email.

 www.wipro.com


Re: foreach in PIG is not working.

2012-07-25 Thread Mohammad Tariq
try this :
A=load '/HADOOP/Yogesh/demo.txt' using PigStorage (' ') as
(name:chararray, roll:int)

Regards,
Mohammad Tariq


On Wed, Jul 25, 2012 at 11:47 PM, pablomar
pablo.daniel.marti...@gmail.com wrote:
 are the commas in your file in the write places ?
 your DUMP A shows
 (Yogesh 12,)
 (Aashi 13,)
 (mohit 14,)

 comma after the number. It makes me believe that it is taking both fields
 as name, so when you do the foreach, it keeps the whole thing

 I mean, is your file (wrong)
 Yogesh 12,
 Aashi 13,
 mohit 14,

 or (good, for this case)
 Yogesh, 12
 Aashi, 13
 mohit, 14


 On Wed, Jul 25, 2012 at 2:06 PM, yogesh.kuma...@wipro.com wrote:

 Hi all,

 I loaded a file to pig by command from HDFS.

 A=load '/HADOOP/Yogesh/demo.txt'
 as (name:chararray, roll:int);

 its get loaded and when i do

 dump A:

 it shows

 (Yogesh 12,)
 (Aashi 13,)
 (mohit 14,)

 Now i run another query

 B= foreach A generate name;

 to get result only names from A.

 but dump B; hows the same result as ofdump A; i.e

 (Yogesh 12,)
 (Aashi 13,)
 (mohit 14,)


 Please help and suggest.

 Thanks  Regards
 Yogesh Kumar

 Please do not print this email unless it is absolutely necessary.

 The information contained in this electronic message and any attachments
 to this message are intended for the exclusive use of the addressee(s) and
 may contain proprietary, confidential or privileged information. If you are
 not the intended recipient, you should not disseminate, distribute or copy
 this e-mail. Please notify the sender immediately and destroy all copies of
 this message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient
 should check this email and any attachments for the presence of viruses.
 The company accepts no liability for any damage caused by any virus
 transmitted by this email.

 www.wipro.com



Re: foreach in PIG is not working.

2012-07-25 Thread Mohammad Tariq
Don't use (',')...By default, LOAD looks for our data in a
tab-delimited file using the default LOAD function i.e PigStorage. But
the data in your file is space separated and not tab-separated. So
you need to tell Pig in what manner  you want to load your file. See
the command properly which I have shown in the previous mail.
A=load '/HADOOP/Yogesh/demo.txt' using PigStorage ('  ') as
(name:chararray, roll:int)

Pay attention to PigStorage ('  ')..I haven't used comma there. Use
it..It should work.

Regards,
Mohammad Tariq


On Thu, Jul 26, 2012 at 12:01 AM,  yogesh.kuma...@wipro.com wrote:
 Thanks All :-)

 yes the file I have uploaded was text file having format
 (Yogesh 12)
 (Aashi 13)
 (Mohit 14)


 I used command

  A = load '/Yogesh/demo.txt' using PigStorage(',')
 as (name:chararray, roll:int);

 and then Dump A;

 The result is

 (Yogesh 12,)
 (Aashi 13,)
 (Mohit 14,)

 it should be

 (Yogesh, 12)
 (Aashi , 13)
 (Mohit, 14)

 Whats I am missing here :-( ?

 Regards
 Yogesh Kumar

 
 From: Mohammad Tariq [donta...@gmail.com]
 Sent: Wednesday, July 25, 2012 11:50 PM
 To: user@pig.apache.org
 Subject: Re: foreach in PIG is not working.

 try this :
 A=load '/HADOOP/Yogesh/demo.txt' using PigStorage (' ') as
 (name:chararray, roll:int)

 Regards,
 Mohammad Tariq


 On Wed, Jul 25, 2012 at 11:47 PM, pablomar
 pablo.daniel.marti...@gmail.com wrote:
 are the commas in your file in the write places ?
 your DUMP A shows
 (Yogesh 12,)
 (Aashi 13,)
 (mohit 14,)

 comma after the number. It makes me believe that it is taking both fields
 as name, so when you do the foreach, it keeps the whole thing

 I mean, is your file (wrong)
 Yogesh 12,
 Aashi 13,
 mohit 14,

 or (good, for this case)
 Yogesh, 12
 Aashi, 13
 mohit, 14


 On Wed, Jul 25, 2012 at 2:06 PM, yogesh.kuma...@wipro.com wrote:

 Hi all,

 I loaded a file to pig by command from HDFS.

 A=load '/HADOOP/Yogesh/demo.txt'
 as (name:chararray, roll:int);

 its get loaded and when i do

 dump A:

 it shows

 (Yogesh 12,)
 (Aashi 13,)
 (mohit 14,)

 Now i run another query

 B= foreach A generate name;

 to get result only names from A.

 but dump B; hows the same result as ofdump A; i.e

 (Yogesh 12,)
 (Aashi 13,)
 (mohit 14,)


 Please help and suggest.

 Thanks  Regards
 Yogesh Kumar

 Please do not print this email unless it is absolutely necessary.

 The information contained in this electronic message and any attachments
 to this message are intended for the exclusive use of the addressee(s) and
 may contain proprietary, confidential or privileged information. If you are
 not the intended recipient, you should not disseminate, distribute or copy
 this e-mail. Please notify the sender immediately and destroy all copies of
 this message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient
 should check this email and any attachments for the presence of viruses.
 The company accepts no liability for any damage caused by any virus
 transmitted by this email.

 www.wipro.com


 Please do not print this email unless it is absolutely necessary.

 The information contained in this electronic message and any attachments to 
 this message are intended for the exclusive use of the addressee(s) and may 
 contain proprietary, confidential or privileged information. If you are not 
 the intended recipient, you should not disseminate, distribute or copy this 
 e-mail. Please notify the sender immediately and destroy all copies of this 
 message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient should 
 check this email and any attachments for the presence of viruses. The company 
 accepts no liability for any damage caused by any virus transmitted by this 
 email.

 www.wipro.com


Re: foreach in PIG is not working.

2012-07-25 Thread Mohammad Tariq
complete query would be something like this :

grunt a = load '/dir1/demo.txt' using PigStorage (' ') as
(name:chararray, roll:int ); #Pay special attention to PigStorage
(' ')..Don't use


 comma there
grunt dump a;

grunt b = foreach a generate name;

grunt dump b;

Regards,
Mohammad Tariq


On Thu, Jul 26, 2012 at 12:06 AM, Mohammad Tariq donta...@gmail.com wrote:
 Don't use (',')...By default, LOAD looks for our data in a
 tab-delimited file using the default LOAD function i.e PigStorage. But
 the data in your file is space separated and not tab-separated. So
 you need to tell Pig in what manner  you want to load your file. See
 the command properly which I have shown in the previous mail.
 A=load '/HADOOP/Yogesh/demo.txt' using PigStorage ('  ') as
 (name:chararray, roll:int)

 Pay attention to PigStorage ('  ')..I haven't used comma there. Use
 it..It should work.

 Regards,
 Mohammad Tariq


 On Thu, Jul 26, 2012 at 12:01 AM,  yogesh.kuma...@wipro.com wrote:
 Thanks All :-)

 yes the file I have uploaded was text file having format
 (Yogesh 12)
 (Aashi 13)
 (Mohit 14)


 I used command

  A = load '/Yogesh/demo.txt' using PigStorage(',')
 as (name:chararray, roll:int);

 and then Dump A;

 The result is

 (Yogesh 12,)
 (Aashi 13,)
 (Mohit 14,)

 it should be

 (Yogesh, 12)
 (Aashi , 13)
 (Mohit, 14)

 Whats I am missing here :-( ?

 Regards
 Yogesh Kumar

 
 From: Mohammad Tariq [donta...@gmail.com]
 Sent: Wednesday, July 25, 2012 11:50 PM
 To: user@pig.apache.org
 Subject: Re: foreach in PIG is not working.

 try this :
 A=load '/HADOOP/Yogesh/demo.txt' using PigStorage (' ') as
 (name:chararray, roll:int)

 Regards,
 Mohammad Tariq


 On Wed, Jul 25, 2012 at 11:47 PM, pablomar
 pablo.daniel.marti...@gmail.com wrote:
 are the commas in your file in the write places ?
 your DUMP A shows
 (Yogesh 12,)
 (Aashi 13,)
 (mohit 14,)

 comma after the number. It makes me believe that it is taking both fields
 as name, so when you do the foreach, it keeps the whole thing

 I mean, is your file (wrong)
 Yogesh 12,
 Aashi 13,
 mohit 14,

 or (good, for this case)
 Yogesh, 12
 Aashi, 13
 mohit, 14


 On Wed, Jul 25, 2012 at 2:06 PM, yogesh.kuma...@wipro.com wrote:

 Hi all,

 I loaded a file to pig by command from HDFS.

 A=load '/HADOOP/Yogesh/demo.txt'
 as (name:chararray, roll:int);

 its get loaded and when i do

 dump A:

 it shows

 (Yogesh 12,)
 (Aashi 13,)
 (mohit 14,)

 Now i run another query

 B= foreach A generate name;

 to get result only names from A.

 but dump B; hows the same result as ofdump A; i.e

 (Yogesh 12,)
 (Aashi 13,)
 (mohit 14,)


 Please help and suggest.

 Thanks  Regards
 Yogesh Kumar

 Please do not print this email unless it is absolutely necessary.

 The information contained in this electronic message and any attachments
 to this message are intended for the exclusive use of the addressee(s) and
 may contain proprietary, confidential or privileged information. If you are
 not the intended recipient, you should not disseminate, distribute or copy
 this e-mail. Please notify the sender immediately and destroy all copies of
 this message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient
 should check this email and any attachments for the presence of viruses.
 The company accepts no liability for any damage caused by any virus
 transmitted by this email.

 www.wipro.com


 Please do not print this email unless it is absolutely necessary.

 The information contained in this electronic message and any attachments to 
 this message are intended for the exclusive use of the addressee(s) and may 
 contain proprietary, confidential or privileged information. If you are not 
 the intended recipient, you should not disseminate, distribute or copy this 
 e-mail. Please notify the sender immediately and destroy all copies of this 
 message and any attachments.

 WARNING: Computer viruses can be transmitted via email. The recipient should 
 check this email and any attachments for the presence of viruses. The 
 company accepts no liability for any damage caused by any virus transmitted 
 by this email.

 www.wipro.com


Re: Is there a loader that loads a file as a line?

2012-06-21 Thread Mohammad Tariq
Hello Jonathan,
Have a look at Hadoop's WholeFileInputFormat..Might fit into
your requirements.
Regards,
    Mohammad Tariq


On Fri, Jun 22, 2012 at 3:39 AM, Prashant Kommireddi
prash1...@gmail.com wrote:
 I think you will need to implement a RecordReader/InputFormat of your own
 for this and use it with a LoadFunc. Not sure if Hadoop has a Reader that
 you could re-use for this.

 How do you handle the case when a file exceeds block size?

 On Thu, Jun 21, 2012 at 2:34 PM, Jonathan Coveney jcove...@gmail.comwrote:

 It can even be a bytearray. Basically I have a bunch of files, and I want
 one file - one row. Is there an easy way to do this? Or will I need to
 provide a special fileinputformat etc?



Re: How pig get hadoop and hbase configuration?

2012-06-13 Thread Mohammad Tariq
Hello,

   Copy the hadoop-core-*.jar from your hadoop folder to the hbase/lib
folder.Also copy commons-configuration-1.6.jar from hadoop/lib folder
to hbase/lib folder...Some times due to incompatible jars this may
happen..do it and see if it works for you.

Regards,
    Mohammad Tariq


On Wed, Jun 13, 2012 at 12:34 PM, lulynn_2008 lulynn_2...@163.com wrote:
  Hi everyone,
 Following is mine test environment:
 node 1:namenode, secondarynamenode, jobtracker, hbase master
 node 2:datanode, tasktracker
 In node 1, I run following COMMANDS in pig shell, but I found map task failed 
 in tasktracker node with error HBase is able to connect to ZooKeeper but the 
 connection closes immediately.. This mean tasktracker did not get current 
 hbase configuration. But I can find the correct hbase configuration in 
 jobtracker node. Seems tasktracker node did not get configuration from 
 jobtracker node, but get configuration from hadoop classpath in tasktracker 
 node.
 I think tasktracker node should get hbase configuration from jobtracker node, 
 but not from local hadoop classpath. Am I correct?

 In tasktracker side, after I add hbase-site.xml to hadoop classpath, the test 
 case passed.
 My question is: how tasktracker node get hbase configuration from tasktracker 
 side? From jobtracker side(included in *.jar file transferred by jobtracker 
 node) or local hadoop classpath?

 COMMANDS:
 REGISTER /home/pig/Rules.jar;
 REGISTER '/home/pig/zookeeper.jar';
 REGISTER '/home/pig/guava-r06.jar';
 REGISTER '/home/pig/hbase-0.90.5.jar';

 test = LOAD 'hbase://table' USING 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'd:sWords','-loadKey true') 
 AS (ID: bytearray  , Words:chararray );
 result = FOREACH test GENERATE ID, com.nice.rules.RunRules(Words);
 --result = FOREACH AA GENERATE com.nice.rules.RunRules(Words), ID;
 --dump result;

 store result into 'table' using 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:drools_cat');
 --store result into 'AA_10_categs' using 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:cat');



Re: Re: How pig get hadoop and hbase configuration?

2012-06-13 Thread Mohammad Tariq
HBase is able to connect to ZooKeeper but the connection closes
immediately. - This error means that your HMaster is not able to talk
to your Namenode.

Regards,
    Mohammad Tariq


On Wed, Jun 13, 2012 at 1:12 PM, lulynn_2008 lulynn_2...@163.com wrote:
 Hello,
 hadoop-core-*.jar and commons-configuration-1.6.jar have been in hbase lib 
 directory. jobtracker node can get correct hbase configuration, but 
 tasktracker node can not.




 At 2012-06-13 15:35:21,Mohammad Tariq donta...@gmail.com wrote:
Hello,

   Copy the hadoop-core-*.jar from your hadoop folder to the hbase/lib
folder.Also copy commons-configuration-1.6.jar from hadoop/lib folder
to hbase/lib folder...Some times due to incompatible jars this may
happen..do it and see if it works for you.

Regards,
    Mohammad Tariq


On Wed, Jun 13, 2012 at 12:34 PM, lulynn_2008 lulynn_2...@163.com wrote:
  Hi everyone,
 Following is mine test environment:
 node 1:namenode, secondarynamenode, jobtracker, hbase master
 node 2:datanode, tasktracker
 In node 1, I run following COMMANDS in pig shell, but I found map task 
 failed in tasktracker node with error HBase is able to connect to 
 ZooKeeper but the connection closes immediately.. This mean tasktracker 
 did not get current hbase configuration. But I can find the correct hbase 
 configuration in jobtracker node. Seems tasktracker node did not get 
 configuration from jobtracker node, but get configuration from hadoop 
 classpath in tasktracker node.
 I think tasktracker node should get hbase configuration from jobtracker 
 node, but not from local hadoop classpath. Am I correct?

 In tasktracker side, after I add hbase-site.xml to hadoop classpath, the 
 test case passed.
 My question is: how tasktracker node get hbase configuration from 
 tasktracker side? From jobtracker side(included in *.jar file transferred 
 by jobtracker node) or local hadoop classpath?

 COMMANDS:
 REGISTER /home/pig/Rules.jar;
 REGISTER '/home/pig/zookeeper.jar';
 REGISTER '/home/pig/guava-r06.jar';
 REGISTER '/home/pig/hbase-0.90.5.jar';

 test = LOAD 'hbase://table' USING 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'd:sWords','-loadKey 
 true') AS (ID: bytearray  , Words:chararray );
 result = FOREACH test GENERATE ID, com.nice.rules.RunRules(Words);
 --result = FOREACH AA GENERATE com.nice.rules.RunRules(Words), ID;
 --dump result;

 store result into 'table' using 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:drools_cat');
 --store result into 'AA_10_categs' using 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:cat');



Re: Re: How pig get hadoop and hbase configuration?

2012-06-13 Thread Mohammad Tariq
Could you send me your hadoop and hbase config files???

Regards,
    Mohammad Tariq


On Wed, Jun 13, 2012 at 1:18 PM, Mohammad Tariq donta...@gmail.com wrote:
 HBase is able to connect to ZooKeeper but the connection closes
 immediately. - This error means that your HMaster is not able to talk
 to your Namenode.

 Regards,
     Mohammad Tariq


 On Wed, Jun 13, 2012 at 1:12 PM, lulynn_2008 lulynn_2...@163.com wrote:
 Hello,
 hadoop-core-*.jar and commons-configuration-1.6.jar have been in hbase lib 
 directory. jobtracker node can get correct hbase configuration, but 
 tasktracker node can not.




 At 2012-06-13 15:35:21,Mohammad Tariq donta...@gmail.com wrote:
Hello,

   Copy the hadoop-core-*.jar from your hadoop folder to the hbase/lib
folder.Also copy commons-configuration-1.6.jar from hadoop/lib folder
to hbase/lib folder...Some times due to incompatible jars this may
happen..do it and see if it works for you.

Regards,
    Mohammad Tariq


On Wed, Jun 13, 2012 at 12:34 PM, lulynn_2008 lulynn_2...@163.com wrote:
  Hi everyone,
 Following is mine test environment:
 node 1:namenode, secondarynamenode, jobtracker, hbase master
 node 2:datanode, tasktracker
 In node 1, I run following COMMANDS in pig shell, but I found map task 
 failed in tasktracker node with error HBase is able to connect to 
 ZooKeeper but the connection closes immediately.. This mean tasktracker 
 did not get current hbase configuration. But I can find the correct hbase 
 configuration in jobtracker node. Seems tasktracker node did not get 
 configuration from jobtracker node, but get configuration from hadoop 
 classpath in tasktracker node.
 I think tasktracker node should get hbase configuration from jobtracker 
 node, but not from local hadoop classpath. Am I correct?

 In tasktracker side, after I add hbase-site.xml to hadoop classpath, the 
 test case passed.
 My question is: how tasktracker node get hbase configuration from 
 tasktracker side? From jobtracker side(included in *.jar file transferred 
 by jobtracker node) or local hadoop classpath?

 COMMANDS:
 REGISTER /home/pig/Rules.jar;
 REGISTER '/home/pig/zookeeper.jar';
 REGISTER '/home/pig/guava-r06.jar';
 REGISTER '/home/pig/hbase-0.90.5.jar';

 test = LOAD 'hbase://table' USING 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'd:sWords','-loadKey 
 true') AS (ID: bytearray  , Words:chararray );
 result = FOREACH test GENERATE ID, com.nice.rules.RunRules(Words);
 --result = FOREACH AA GENERATE com.nice.rules.RunRules(Words), ID;
 --dump result;

 store result into 'table' using 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:drools_cat');
 --store result into 'AA_10_categs' using 
 org.apache.pig.backend.hadoop.hbase.HBaseStorage('d:cat');



Re: How to use TOP?

2012-05-22 Thread Mohammad Tariq
Hi Abhinav,

   Thanks a lot for the valuable response..Actually I was thinking of
doing the same thing, but being new to Pig I thought of asking it on
the mailing list first..As far as the data is concerned, second column
will always be in ascending order.But I don't think it will be of any
help..I think whatever you have suggested here would be the
appropriate solution..Although I would like to ask you one thing..Is
it feasible to add that first column having count in my pig script or
do I have to change the data in my Hbase table itself???If yes then
how can I achieve it in my script??Many thanks.

Regards,
    Mohammad Tariq


On Tue, May 22, 2012 at 1:16 AM, Abhinav Neelam abhinavroc...@gmail.com wrote:
 Hey Mohammad,

 You need to have sorting requirements when you say 'top 5' records. Because
 relations/bags in Pig are unordered, it's natural to ask: 'top 5 by what
 parameter?' I'm unfamiliar with HBase, but if your data in HBase has an
 implicit ordering with say an auto-increment primary key, or an explicit
 one, you could include that field in your input to Pig and then apply TOP
 on that field.

 Having said that, if I understand your problem correctly, you don't need
 TOP at all - you just want to process your input in groups of 5 tuples at a
 time. Again, I can't think of a way of doing this without modifying your
 input. For example, if your input included an extra field like this:
 1 18.98   2000             1.21   193.46  2.64        58.17
 1 52.49   2000.5   4.32           947.11  2.74        64.45
 1 115.24  2001             16.8   878.58  2.66        94.49
 1 55.55   2001.5   33.03  656.56  2.82        60.76
 1 156.14  2002             35.52  83.75   2.6         59.57
 2 138.77  2002.5   21.51  105.76  2.62        85.89
 2 71.89   2003             27.79  709.01  2.63        85.44
 2 59.84   2003.5   32.1           444.82  2.72        70.8
 2 103.18  2004             4.09   413.15  2.8         54.37

 you could do a group on that field and proceed. Even if you had a field
 like 'line number' or 'record number' in your input, you could still
 manipulate that field (say through integer division by 5) to use it for
 grouping. In any case, you need something to let Pig bring together your 5
 tuple groups.

 B = group A by $0;
 C = FOREACH B { do some processing on your 5 tuple bag A ...

 Thanks,
 Abhinav

 On 21 May 2012 23:03, Mohammad Tariq donta...@gmail.com wrote:

 Hi Ruslan,

    Thanks for the response.I think I have made a mistake.Actually I
 just want the top 5 records each time.I don't have any sorting
 requirements.

 Regards,
     Mohammad Tariq


 On Mon, May 21, 2012 at 9:31 PM, Ruslan Al-fakikh
 ruslan.al-fak...@jalent.ru wrote:
  Hey Mohammad,
 
  Here
  c = TOP(5,3,a);
  you say: take 5 records out of a that have the biggest values in the
 third
  column. Do you really need that sorting by the third column?
 
  -Original Message-
  From: Mohammad Tariq [mailto:donta...@gmail.com]
  Sent: Monday, May 21, 2012 3:54 PM
  To: user@pig.apache.org
  Subject: How to use TOP?
 
  Hello list,
 
   I have an Hdfs file that has 6 columns that contain some data stored in
 an
  Hbase table.the data looks like this -
 
  18.98   2000             1.21   193.46  2.64        58.17
  52.49   2000.5   4.32           947.11  2.74        64.45
  115.24  2001             16.8   878.58  2.66        94.49
  55.55   2001.5   33.03  656.56  2.82        60.76
  156.14  2002             35.52  83.75   2.6         59.57
  138.77  2002.5   21.51  105.76  2.62        85.89
  71.89   2003             27.79  709.01  2.63        85.44
  59.84   2003.5   32.1           444.82  2.72        70.8
  103.18  2004             4.09   413.15  2.8         54.37
 
  Now I have to take each record along with its next 4 records and do some
  processing(for example, in the first shot I have to take records 1-5, in
 the
  next shot I have to take 2-6 and so on)..I am trying to use TOP for this,
  but getting the following error -
 
  2012-05-21 17:04:30,328 [main] ERROR org.apache.pig.tools.grunt.Grunt
  - ERROR 1200: Pig script failed to parse:
  line 6, column 37 Invalid scalar projection: parameters : A column
 needs
  to be projected from a relation for it to be used as a scalar Details at
  logfile: /home/mohammad/pig-0.9.2/logs/pig_1337599211281.log
 
  I am using following commands -
 
  grunt a = load 'hbase://logdata'
  using org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'cf:DGR cf:HD cf:POR cf:RES cf:RHOB cf:SON', '-loadKey true') as (id,
  DGR, HD, POR, RES, RHOB, SON);
  grunt b = foreach a { c = TOP(5,3,a);
  generate flatten(c);
  }
 
  Could anyone tell me how to achieve thatMany thanks.
 
  Regards,
      Mohammad Tariq
 




 --
 Hacking is, and always has been, the Holy
 Grail of computer science.


Re: How to use TOP?

2012-05-22 Thread Mohammad Tariq
Yes, it would be better if I do it at the time of insertion.Just have
to add one more column.Thanks again.

Regards,
    Mohammad Tariq


On Tue, May 22, 2012 at 2:36 PM, Abhinav Neelam abhinavroc...@gmail.com wrote:
 Doing it in the pig script is not feasible because pig doesn't have any
 notion of sequentiality - to maintain it, you'd need to have access to
 state that's shared globally by all the mappers and reducers. One way I can
 think of doing this is to have a UDF that maintains state - perhaps it can
 maintain a file that's NFS mounted/or in HDFS so that it's available on all
 the task nodes; then any call to the UDF can update that file (atomically)
 and return a 'row number' that you could associate with your current tuple.
 Something like:
 B = FOREACH A GENERATE $0, $1, $2, $3, MyUDFs.GETROWNUM() as rownum;

 However, AFAIK, you'd be better off doing it in HBase - perhaps at the time
 of record insert, you could also add a 'row number' into the record?

 On 22 May 2012 12:43, Mohammad Tariq donta...@gmail.com wrote:

 Hi Abhinav,

   Thanks a lot for the valuable response..Actually I was thinking of
 doing the same thing, but being new to Pig I thought of asking it on
 the mailing list first..As far as the data is concerned, second column
 will always be in ascending order.But I don't think it will be of any
 help..I think whatever you have suggested here would be the
 appropriate solution..Although I would like to ask you one thing..Is
 it feasible to add that first column having count in my pig script or
 do I have to change the data in my Hbase table itself???If yes then
 how can I achieve it in my script??Many thanks.

 Regards,
     Mohammad Tariq


 On Tue, May 22, 2012 at 1:16 AM, Abhinav Neelam abhinavroc...@gmail.com
 wrote:
  Hey Mohammad,
 
  You need to have sorting requirements when you say 'top 5' records.
 Because
  relations/bags in Pig are unordered, it's natural to ask: 'top 5 by what
  parameter?' I'm unfamiliar with HBase, but if your data in HBase has an
  implicit ordering with say an auto-increment primary key, or an explicit
  one, you could include that field in your input to Pig and then apply TOP
  on that field.
 
  Having said that, if I understand your problem correctly, you don't need
  TOP at all - you just want to process your input in groups of 5 tuples
 at a
  time. Again, I can't think of a way of doing this without modifying your
  input. For example, if your input included an extra field like this:
  1 18.98   2000             1.21   193.46  2.64        58.17
  1 52.49   2000.5   4.32           947.11  2.74        64.45
  1 115.24  2001             16.8   878.58  2.66        94.49
  1 55.55   2001.5   33.03  656.56  2.82        60.76
  1 156.14  2002             35.52  83.75   2.6         59.57
  2 138.77  2002.5   21.51  105.76  2.62        85.89
  2 71.89   2003             27.79  709.01  2.63        85.44
  2 59.84   2003.5   32.1           444.82  2.72        70.8
  2 103.18  2004             4.09   413.15  2.8         54.37
 
  you could do a group on that field and proceed. Even if you had a field
  like 'line number' or 'record number' in your input, you could still
  manipulate that field (say through integer division by 5) to use it for
  grouping. In any case, you need something to let Pig bring together your
 5
  tuple groups.
 
  B = group A by $0;
  C = FOREACH B { do some processing on your 5 tuple bag A ...
 
  Thanks,
  Abhinav
 
  On 21 May 2012 23:03, Mohammad Tariq donta...@gmail.com wrote:
 
  Hi Ruslan,
 
     Thanks for the response.I think I have made a mistake.Actually I
  just want the top 5 records each time.I don't have any sorting
  requirements.
 
  Regards,
      Mohammad Tariq
 
 
  On Mon, May 21, 2012 at 9:31 PM, Ruslan Al-fakikh
  ruslan.al-fak...@jalent.ru wrote:
   Hey Mohammad,
  
   Here
   c = TOP(5,3,a);
   you say: take 5 records out of a that have the biggest values in the
  third
   column. Do you really need that sorting by the third column?
  
   -Original Message-
   From: Mohammad Tariq [mailto:donta...@gmail.com]
   Sent: Monday, May 21, 2012 3:54 PM
   To: user@pig.apache.org
   Subject: How to use TOP?
  
   Hello list,
  
    I have an Hdfs file that has 6 columns that contain some data stored
 in
  an
   Hbase table.the data looks like this -
  
   18.98   2000             1.21   193.46  2.64        58.17
   52.49   2000.5   4.32           947.11  2.74        64.45
   115.24  2001             16.8   878.58  2.66        94.49
   55.55   2001.5   33.03  656.56  2.82        60.76
   156.14  2002             35.52  83.75   2.6         59.57
   138.77  2002.5   21.51  105.76  2.62        85.89
   71.89   2003             27.79  709.01  2.63        85.44
   59.84   2003.5   32.1           444.82  2.72        70.8
   103.18  2004             4.09   413.15  2.8         54.37
  
   Now I have to take each record along with its next 4 records and do
 some
   processing(for example, in the first shot I have

How to use TOP?

2012-05-21 Thread Mohammad Tariq
Hello list,

  I have an Hdfs file that has 6 columns that contain some data stored
in an Hbase table.the data looks like this -

18.98   2000 1.21   193.46  2.6458.17
52.49   2000.5   4.32   947.11  2.7464.45
115.24  2001 16.8   878.58  2.6694.49
55.55   2001.5   33.03  656.56  2.8260.76
156.14  2002 35.52  83.75   2.6 59.57
138.77  2002.5   21.51  105.76  2.6285.89
71.89   2003 27.79  709.01  2.6385.44
59.84   2003.5   32.1   444.82  2.7270.8
103.18  2004 4.09   413.15  2.8 54.37

Now I have to take each record along with its next 4 records and do
some processing(for example, in the first shot I have to take records
1-5, in the next shot I have to take 2-6 and so on)..I am trying to
use TOP for this, but getting the following error -

2012-05-21 17:04:30,328 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 1200: Pig script failed to parse:
line 6, column 37 Invalid scalar projection: parameters : A column
needs to be projected from a relation for it to be used as a scalar
Details at logfile: /home/mohammad/pig-0.9.2/logs/pig_1337599211281.log

I am using following commands -

grunt a = load 'hbase://logdata'
 using org.apache.pig.backend.hadoop.hbase.HBaseStorage(
 'cf:DGR cf:HD cf:POR cf:RES cf:RHOB cf:SON', '-loadKey true')
 as (id, DGR, HD, POR, RES, RHOB, SON);
grunt b = foreach a { c = TOP(5,3,a);
 generate flatten(c);
 }

Could anyone tell me how to achieve thatMany thanks.

Regards,
    Mohammad Tariq


Re: How to use TOP?

2012-05-21 Thread Mohammad Tariq
Hi Ruslan,

Thanks for the response.I think I have made a mistake.Actually I
just want the top 5 records each time.I don't have any sorting
requirements.

Regards,
    Mohammad Tariq


On Mon, May 21, 2012 at 9:31 PM, Ruslan Al-fakikh
ruslan.al-fak...@jalent.ru wrote:
 Hey Mohammad,

 Here
 c = TOP(5,3,a);
 you say: take 5 records out of a that have the biggest values in the third
 column. Do you really need that sorting by the third column?

 -Original Message-
 From: Mohammad Tariq [mailto:donta...@gmail.com]
 Sent: Monday, May 21, 2012 3:54 PM
 To: user@pig.apache.org
 Subject: How to use TOP?

 Hello list,

  I have an Hdfs file that has 6 columns that contain some data stored in an
 Hbase table.the data looks like this -

 18.98   2000             1.21   193.46  2.64        58.17
 52.49   2000.5   4.32           947.11  2.74        64.45
 115.24  2001             16.8   878.58  2.66        94.49
 55.55   2001.5   33.03  656.56  2.82        60.76
 156.14  2002             35.52  83.75   2.6         59.57
 138.77  2002.5   21.51  105.76  2.62        85.89
 71.89   2003             27.79  709.01  2.63        85.44
 59.84   2003.5   32.1           444.82  2.72        70.8
 103.18  2004             4.09   413.15  2.8         54.37

 Now I have to take each record along with its next 4 records and do some
 processing(for example, in the first shot I have to take records 1-5, in the
 next shot I have to take 2-6 and so on)..I am trying to use TOP for this,
 but getting the following error -

 2012-05-21 17:04:30,328 [main] ERROR org.apache.pig.tools.grunt.Grunt
 - ERROR 1200: Pig script failed to parse:
 line 6, column 37 Invalid scalar projection: parameters : A column needs
 to be projected from a relation for it to be used as a scalar Details at
 logfile: /home/mohammad/pig-0.9.2/logs/pig_1337599211281.log

 I am using following commands -

 grunt a = load 'hbase://logdata'
 using org.apache.pig.backend.hadoop.hbase.HBaseStorage(
 'cf:DGR cf:HD cf:POR cf:RES cf:RHOB cf:SON', '-loadKey true') as (id,
 DGR, HD, POR, RES, RHOB, SON);
 grunt b = foreach a { c = TOP(5,3,a);
 generate flatten(c);
 }

 Could anyone tell me how to achieve thatMany thanks.

 Regards,
     Mohammad Tariq



how to achieve foreach 'n'

2012-05-17 Thread Mohammad Tariq
Hello list,

I have loaded data from an Hbase table and the relation looks like this -

18.98   20001.21193.46  2.6458.17
� ��ȫ ��� �t52.49   2000.5  4.32947.11  2.7464.45
� ��ȫ ��� �h115.24  200116.8878.58  2.6694.49
� ��ȫ ��� �\55.55   2001.5  33.03   656.56  2.8260.76
� ��ȫ ��� �P156.14  200235.52   83.75   2.6 59.57
� ��ȫ ��� �D138.77  2002.5  21.51   105.76  2.6285.89
� ��ȫ ��� �871.89   200327.79   709.01  2.6385.44
Regards,
    Mohammad Tariq


Re: how to achieve foreach 'n'

2012-05-17 Thread Mohammad Tariq
Sorry for the previous mail.Actually I wanted to ask how can I take 5
records at a time from the relation and perform the desired
operation.Is it feasible to put each 5 tuple in a bag and then apply
'foreach' on the bags??If yes please let me now how can I achieve
this.Or is there any better way to do this.I am new to pig so finding
it a bit tricky.Many thanks.

Regards,
    Mohammad Tariq


On Thu, May 17, 2012 at 1:44 PM, Mohammad Tariq donta...@gmail.com wrote:
 Hello list,

    I have loaded data from an Hbase table and the relation looks like this -

 18.98   2000    1.21    193.46  2.64    58.17
 � ��ȫ ��� �t        52.49   2000.5  4.32    947.11  2.74    64.45
 � ��ȫ ��� �h        115.24  2001    16.8    878.58  2.66    94.49
 � ��ȫ ��� �\        55.55   2001.5  33.03   656.56  2.82    60.76
 � ��ȫ ��� �P        156.14  2002    35.52   83.75   2.6     59.57
 � ��ȫ ��� �D        138.77  2002.5  21.51   105.76  2.62    85.89
 � ��ȫ ��� �8        71.89   2003    27.79   709.01  2.63    85.44
 Regards,
     Mohammad Tariq