Re: New Apache Pig Committer: Nandor Kollar
Congratulations! Best Regards ZhangLiyun/Kelly Zhang On 2018/9/7, 3:59 AM, "Koji Noguchi" wrote: On behalf of the Apache Pig PMC, it is my pleasure to announce that Nandor Kollar has accepted the invitation to become an Apache Pig committer. We appreciate all the work Nandor has done and look forward to seeing continued involvement. Please join me in congratulating Nandor! Thanks, Koji
RE: Review Request 59530: PIG-5157 Upgrade to Spark 2.0
No need to add null checks. Thanks for explanation. Please update the patch with latest trunk code and this makes me easy to test the code. Best Regards Kelly Zhang/Zhang,Liyun -Original Message- From: Nandor Kollar [mailto:nore...@reviews.apache.org] On Behalf Of Nandor Kollar Sent: Tuesday, June 13, 2017 3:52 PM To: Rohini Palaniswamy <rohini.adi...@gmail.com>; Zhang, Liyun <liyun.zh...@intel.com>; Adam Szita <sz...@cloudera.com> Cc: Zhang, Liyun <liyun.zh...@intel.com>; pig <dev@pig.apache.org>; Nandor Kollar <nkol...@cloudera.com> Subject: Re: Review Request 59530: PIG-5157 Upgrade to Spark 2.0 > On June 13, 2017, 7:33 a.m., kelly zhang wrote: > > src/org/apache/pig/tools/pigstats/spark/SparkJobStats1.java > > Lines 60-63 (patched) > > <https://reviews.apache.org/r/59530/diff/4/?file=1749088#file1749088line60> > > > > why > > inputMetricExists,outputMetricExist,shuffleReadMetricExist,shuffleWriteMetricExist > > are deleted in SparkJobStats2.java? These metrics are no longer Optional, but have an initialized value. For example this is the shuffleWriteMetrics: val shuffleWriteMetrics: ShuffleWriteMetrics = new ShuffleWriteMetrics(); I don't think we need to check for the existence of these metrics, they are initialized all the time, but if you think it is better to check for null before using them, I can add null checks. for it. - Nandor --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59530/#review177699 --- On June 12, 2017, 9:20 p.m., Nandor Kollar wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59530/ > --- > > (Updated June 12, 2017, 9:20 p.m.) > > > Review request for pig, liyun zhang, Rohini Palaniswamy, and Adam Szita. > > > Repository: pig-git > > > Description > --- > > Upgrade to Spark 2.1 API using shims. > > > Diffs > - > > build.xml 4040fcec8f88d448ed7442461fbf0dea8cd1136e > ivy.xml 971724380f086d214ce62c7ab7879b08b6926802 > ivy/libraries.properties a0eb00acd2df42324540df4a9d762c64c608a6d3 > > src/org/apache/pig/backend/hadoop/executionengine/spark/FlatMapFunctionAdapter.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/JobMetricsListener.java > f81341233447203abc4800cc7b22a4f419e10262 > > src/org/apache/pig/backend/hadoop/executionengine/spark/PairFlatMapFunctionAdapter.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkLauncher.java > c6351e01a48f297ea2e432401ffd65c4f27f8078 > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkShim.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkShim1.java > PRE-CREATION > src/org/apache/pig/backend/hadoop/executionengine/spark/SparkShim2.java > PRE-CREATION > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/CollectedGroupConverter.java > 83311dfa5bb25209a5366c2db7e8d483c31d94cd > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/FRJoinConverter.java > 382258e7ff9105aa397c5a2888df0c11e9562ec9 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/ForEachConverter.java > b58415e7e18ca4cf1331beef06e9214600a51424 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/GlobalRearrangeConverter.java > f571b808839c2de9415a3e8e4b229a7f4b2eebd7 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/LimitConverter.java > fe1b54c8f128661d7d19c276d3bb2de7874d3086 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeCogroupConverter.java > adf78ecab0da10d3b1a7fdde8af2b42dd899810f > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/MergeJoinConverter.java > d1c43b1e06adc4c9fe45a83b8110402e3756 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/PoissonSampleConverter.java > e003bbd95763b2d189ff9ec540c89abe52592420 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SecondaryKeySortUtil.java > 00d29b44848546ed16dde2baa8c61b36939971b2 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SkewedJoinConverter.java > c55ba3145495a53d69db2dd56434dcc9b3bf8ed5 > > src/org/apache/pig/backend/hadoop/executionengine/spark/converter/SortConverter.java > baabfa090323e3bef087e259ce19df2e4c34dd63 > >
RE: [VOTE] Release Pig 0.17.0 (candidate 0)
Thanks Adam, Sorry to reply late. Simple test about pig on spark when using pig-0.17.0.tar.gz(http://home.apache.org/~szita/pig-0.17.0-rc0/pig-0.17.0.tar.gz), not found problem. Best Regards Kelly Zhang/Zhang,Liyun -Original Message- From: Koji Noguchi [mailto:knogu...@yahoo-inc.com.INVALID] Sent: Wednesday, June 7, 2017 5:42 AM To: dev@pig.apache.org Subject: Re: [VOTE] Release Pig 0.17.0 (candidate 0) Thanks Adam. Is asc file missing for the tarball ? (pig-0.17.0.tar.gz.asc) Koji On Friday, June 2, 2017, 5:00:44 PM EDT, Adam Szita <sz...@cloudera.com> wrote: Hi, I have created a candidate build for Pig 0.17.0. The most important changes since 0.16.0 are: (1) Dropped support for Hadoop 1 (2) Included Spark as execution engine Keys used to sign the release are available at http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup . Please download, test, and try it out: http://home.apache.org/~szita/pig-0.17.0-rc0/ Release notes and the rat report are available from the same location. Should we release this? Vote closes on Wednesday, June 7th. Adam
RE: [ANNOUNCE] Welcome new Pig Committer - Adam Szita
Congratulations, Adam! -Original Message- From: Xuefu Zhang [mailto:xu...@uber.com] Sent: Tuesday, May 23, 2017 2:09 AM To: dev@pig.apache.org Cc: u...@pig.apache.org Subject: Re: [ANNOUNCE] Welcome new Pig Committer - Adam Szita Congratulations, Adam! On Mon, May 22, 2017 at 10:51 AM, Rohini Palaniswamy < rohini.adi...@gmail.com> wrote: > Hi all, > It is my pleasure to announce that Adam Szita has been voted in as > a committer to Apache Pig. Please join me in congratulating Adam. Adam > has been actively contributing to core Pig and Pig on Spark. We > appreciate all the work he has done and are looking forward to more > contributions from him. > > Welcome aboard, Adam. > > Regards, > Rohini >
RE: Preparing for Pig 0.16.1 release
Hi Rohini: Very glad to know that Pig 0.17(include Pig on Spark) will be released in March or April. Here What I am very confused now is the code in review board(https://reviews.apache.org/r/45667/) is not the latest code. Which version of code (review board) or branch is ready to merge to trunk? If use spark branch code, I will update newest patch to review board and community will review new commits. Appreciate to get your suggestion. Best Regards Kelly Zhang/Zhang,Liyun -Original Message- From: Rohini Palaniswamy [mailto:rohini.adi...@gmail.com] Sent: Wednesday, February 1, 2017 5:21 AM To: dev@pig.apache.org Subject: Re: Preparing for Pig 0.16.1 release Most patches are in and 0.16.1 branch is close to ready to start doing release candidate builds. The next release Pig 0.17 will be a major release with big features like Pig on Spark and lot of change and will be around end of March or early April. Pig 0.16.1 will most likely be the last patch release before it. So if you need any issues that has to go in the Pig 0.16.1 patch release, please raise it now. If there are no issues raised in the next couple of days, will start the RC process end of this week. Regards, Rohini On Thu, Dec 15, 2016 at 3:05 PM, Rohini Palaniswamy <rohini.adi...@gmail.com > wrote: > Folks, > Many important fixes have into 0.16 branch and it would be good to > have a 0.16.1 release out with those fixes. If there are any jiras you > think should go in, please mark them with 0.16.1. Since this is a > patch release, the focus will be mainly on critical or important bug > fixes. We will target mid Jan for the release. > > Regards, > Rohini >
RE: Why pig on spark use RDD API rather than DataFrame API ?
Hi Jeff: Thanks for your interest, when this project is started (Aug in 2014) DataFrame API is not available and this is why we don't use this in the project. Engineer in InMobi raised similar idea before. In my view, if DataFrame API is more suitable than RDD API, we can consider this in late optimization work after first release. Now you can file a subtask on PIG-4856(an umbrella jira for optimization work) and work on it if have interest. Best Regards Kelly Zhang/Zhang,Liyun -Original Message- From: Jeff Zhang [mailto:zjf...@gmail.com] Sent: Sunday, January 8, 2017 10:13 AM To: dev@pig.apache.org Subject: Why pig on spark use RDD API rather than DataFrame API ? Hi Folks, I am very interested on the project of pig on spark. When I read the code, I find that the current implementation is based on spark RDD API. I don't know the original background (maybe when this project is started, DataFrame API is not available) , but for now I feel DataFrame API might be more suitable than RDD API. Here's 2 advantages of DataFrame API I can think of: 1. DataFrame API is easier to use than RDD API, although it is not flexible than RDD, but I think Pig's tuple data structure is very similar with that of DataFrame. I think it should be able to map each pig operation to data frame operation. If not, we can give feedback to spark community. 2. Spark's catalyst provide lots of optimization on DataFrame. If we use DataFrame API, we can leverage lots of optimization in catalyst rather than reinvent the wheel in pig. What do you think ? Thanks
RE: [ANNOUNCE] Welcome new Pig Committer - Liyun Zhang
Thanks all, glad to get the recognition on this project, and appreciate all your interest on Pig on Spark and hope it will be released in the future. From: Koji Noguchi [mailto:knogu...@yahoo-inc.com] Sent: Wednesday, December 21, 2016 7:10 AM To: dev@pig.apache.org; u...@pig.apache.org Cc: Praveen R <prav...@sigmoidanalytics.com>; Zhang, Liyun <liyun.zh...@intel.com>; Mohit Sabharwal (mo...@cloudera.com) <mo...@cloudera.com>; xu...@uber.com Subject: Re: [ANNOUNCE] Welcome new Pig Committer - Liyun Zhang Congrats!!! From: Rohini Palaniswamy <rohini.adi...@gmail.com<mailto:rohini.adi...@gmail.com>> To: "u...@pig.apache.org<mailto:u...@pig.apache.org>" <u...@pig.apache.org<mailto:u...@pig.apache.org>> Cc: Praveen R <prav...@sigmoidanalytics.com<mailto:prav...@sigmoidanalytics.com>>; "Zhang, Liyun" <liyun.zh...@intel.com<mailto:liyun.zh...@intel.com>>; "dev@pig.apache.org<mailto:dev@pig.apache.org>" <dev@pig.apache.org<mailto:dev@pig.apache.org>>; "Mohit Sabharwal (mo...@cloudera.com<mailto:mo...@cloudera.com>)" <mo...@cloudera.com<mailto:mo...@cloudera.com>>; "xu...@uber.com<mailto:xu...@uber.com>" <xu...@uber.com<mailto:xu...@uber.com>> Sent: Tuesday, December 20, 2016 12:36 PM Subject: Re: [ANNOUNCE] Welcome new Pig Committer - Liyun Zhang Congratulations Liyun !!! On Mon, Dec 19, 2016 at 10:25 PM, Jianfeng (Jeff) Zhang < jzh...@hortonworks.com<mailto:jzh...@hortonworks.com>> wrote: > Congratulations Liyun! > > > > Best Regard, > Jeff Zhang > > > > > > On 12/20/16, 11:29 AM, "Pallavi Rao" > <pallavi@inmobi.com<mailto:pallavi@inmobi.com>> wrote: > > >Congratulations Liyun! > >
RE: [ANNOUNCE] Welcome new Pig Committer - Liyun Zhang
Thanks for all your help on this project! Regards, Zhang,Liyun -Original Message- From: Ke, Xianda [mailto:xianda...@intel.com] Sent: Friday, December 16, 2016 9:51 AM To: dev@pig.apache.org; u...@pig.apache.org Subject: RE: [ANNOUNCE] Welcome new Pig Committer - Liyun Zhang Congrats, Liyun! Regards, Xianda -Original Message- From: Daniel Dai [mailto:da...@hortonworks.com] Sent: Friday, December 16, 2016 5:55 AM To: dev@pig.apache.org; u...@pig.apache.org Subject: [ANNOUNCE] Welcome new Pig Committer - Liyun Zhang It is my pleasure to announce that Liyun Zhang became the newest addition to the Pig Committers! Liyun has been actively contributing to Pig on Spark. Please join me in congratulating Liyun!
FW: File could only be replicated to 0 nodes, instead of 1
Hi: You can google “File could only be replicated to 0 nodes, instead of 1” there are several reasons for it. In most case, it is because of lack of disk space and all datanodes die. Best Regards Kelly Zhang/Zhang,Liyun From: mingda li [mailto:limingda1...@gmail.com] Sent: Wednesday, December 7, 2016 1:01 PM To: dev@pig.apache.org; u...@pig.apache.org Subject: File could only be replicated to 0 nodes, instead of 1 Hi, I am running a multiple join of 100G TPC-DS data with bad order on our cluster. And each time, it returns such log file to me with the exception: Has anyone ever met it? Is it caused by too much data more than disk space? org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/temp-1180529634/tmp-491747926/_temporary/_attempt_201607142217_0115_r_00_0/part-r-0 could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) 5 at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) 6 at sun.reflect.GeneratedMethodAccessor851.invoke(Unknown Source) 7 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 8 at java.lang.reflect.Method.invoke(Method.java:606) 9 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) 10 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) 11 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) 12 at java.security.AccessController.doPrivileged(Native Method) . Pig Stack Trace 32 --- 33 ERROR 1066: Unable to open iterator for alias limit_data 34 35 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias limit_data 36 at org.apache.pig.PigServer.openIterator(PigServer.java:935) 37 at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:754) 38 at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:376) More detailed information is in attachment.
RE: How to test the efficiency of multiple join
Hi: I think the query time about multiple join part is not related with the number of limit operator(in your case the number is 4). When the query is executed, limit_data is executed after Bad_OrderRes, after join (Bad_OrderRes) is finished, limit(limit_data) starts. If I have missed something, please tell me. Best Regards Kelly Zhang/Zhang,Liyun -Original Message- From: mingda li [mailto:limingda1...@gmail.com] Sent: Wednesday, December 7, 2016 8:18 AM To: dev@pig.apache.org; u...@pig.apache.org Subject: How to test the efficiency of multiple join Dear all, I want to test the different multiple join orders' efficiency. However, since the pig query is executed lazily, I need to use dump or store to let the query be executed. Now, I use the following query to test the efficiency. *Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales BY cs_item_sk;* *Bad_OrderRes = JOIN Bad_OrderIn BY (cs_item_sk, cs_order_number), catalog_returns BY (cr_item_sk, cr_order_number);* *limit_data = LIMIT Bad_OrderRes 4; * *Dump limit_data;* Do you think this is OK to just show 4 of results? Could this query execution time represent the efficiency of multilpe join? I am not sure if it will just get 4 items and stop without executing other items. Bests, Mingda
RE: [ANNOUNCE] Congratulations to our new PMC member Koji Noguchi
Congrats Koji! -Original Message- From: Daniel Dai [mailto:da...@hortonworks.com] Sent: Saturday, August 06, 2016 7:28 AM To: dev@pig.apache.org; u...@pig.apache.org Subject: [ANNOUNCE] Congratulations to our new PMC member Koji Noguchi Please welcome Koji Noguchi as our latest Pig PMC member. Congrats Koji!
RE: Can anyone who has the experience on pigmix share configuration and expected results?
istory.JhCounters":{"name":"COUNTERS","groups":[{"name":"org.apache.hadoop.mapreduce.FileSystemCounter","displayName":"File System Counters","counts":[{"name":"FILE_BYTES_READ","displayName":"FILE: Number of bytes read","value":0},{"name":"FILE_BYTES_WRITTEN","displayName":"FILE: Number of bytes written","value":169316},{"name":"FILE_READ_OPS","displayName":"FILE: Number of read operations","value":0},{"name":"FILE_LARGE_READ_OPS","displayName":"FILE: Number of large read operations","value":0},{"name":"FILE_WRITE_OPS","displayName":"FILE: Number of write operations","value":0},{"name":"HDFS_BYTES_READ","displayName":"HDFS: Number of bytes read","value":0},{"name":"HDFS_BYTES_WRITTEN","displayName":"HDFS: Number of bytes written","value":0},{"name":"HDFS_READ_OPS","displayName":"HDFS: Number of read operations","value":0},{"name":"HDFS_LARGE_READ_OPS","displayName":"HDFS: Number of large read operations","value":0},{"name":"HDFS_WRITE_OPS","displayName":"HDFS: Number of write operations","value":0}]},{"name":"org.apache.hadoop.mapreduce.TaskCounter","displayName":"Map-Reduce Framework","counts":[{"name":"COMBINE_INPUT_RECORDS","displayName":"Combine input records","value":0},{"name":"COMBINE_OUTPUT_RECORDS","displayName":"Combine output records","value":0},{"name":"REDUCE_INPUT_GROUPS","displayName":"Reduce input groups","value":0},{"name":"REDUCE_SHUFFLE_BYTES","displayName":"Reduce shuffle bytes","value":21039704},{"name":"REDUCE_INPUT_RECORDS","displayName":"Reduce input records","value":0},{"name":"REDUCE_OUTPUT_RECORDS","displayName":"Reduce output records","value":0},{"name":"SPILLED_RECORDS","displayName":"Spilled Records","value":0},{"name":"SHUFFLED_MAPS","displayName":"Shuffled Maps ","value":6405},{"name":"FAILED_SHUFFLE","displayName":"Failed Shuffles","value":0},{"name":"MERGED_MAP_OUTPUTS","displayName":"Merged Map outputs","value":0},{"name":"GC_TIME_MILLIS","displayName":"GC time elapsed (ms)","value":3617},{"name":"CPU_MILLISECONDS","displayName":"CPU time spent (ms)","value":148570},{"name":"PHYSICAL_MEMORY_BYTES","displayName":"Physical memory (bytes) snapshot","value":346775552},{"name":"VIRTUAL_MEMORY_BYTES","displayName":"Virtual memory (bytes) snapshot","value":2975604736},{"name":"COMMITTED_HEAP_BYTES","displayName":"Total committed heap usage (bytes)","value":1490026496}]},{"name":"Shuffle Errors","displayName":"Shuffle Errors","counts":[{"name":"BAD_ID","displayName":"BAD_ID","value":0},{"name":"CONNECTION","displayName":"CONNECTION","value":0},{"name":"IO_ERROR","displayName":"IO_ERROR","value":0},{"name":"WRONG_LENGTH","displayName":"WRONG_LENGTH","value":0},{"name":"WRONG_MAP","displayName":"WRONG_MAP","value":0},{"name":"WRONG_REDUCE","displayName":"WRONG_REDUCE","value":0}]},{"name":"org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter","displayName":"File Output Format Counters ","counts":[{"name":"BYTES_WRITTEN","displayName":"Bytes Written","value":0}]}]}},"clockSplits":[363810,597022,913686,4199950,340,339,340,340,339,340,340,340],"cpuUsages":[14016,15693,7,96634,0,0,0,0,0,0,0,0],"vMemKbytes
RE: Review Request 45667: Support Pig On Spark
Pallavi create this review board so I have not privilege to upload new patch to this review board, I have sent the new patch to Pallavi and later Pallavi will upload the patch. -Original Message- From: kelly zhang [mailto:nore...@reviews.apache.org] On Behalf Of kelly zhang Sent: Tuesday, June 14, 2016 11:25 AM To: Rohini Palaniswamy; Daniel Dai Cc: Zhang, Liyun; Pallavi Rao; pig Subject: Re: Review Request 45667: Support Pig On Spark > On May 22, 2016, 9:57 p.m., Rohini Palaniswamy wrote: > > test/org/apache/pig/test/TestBuiltin.java, line 3255 > > <https://reviews.apache.org/r/45667/diff/1/?file=1323870#file1323870 > > line3255> > > > > This testcase is broken if you have 0-0 repeating twice. It is not > > UniqueID anymore. 0-0 repeating twice is because we use TaskID in UniqueID#exec: public String exec(Tuple input) throws IOException { String taskIndex = PigMapReduce.sJobConfInternal.get().get(PigConstants.TASK_INDEX); String sequenceId = taskIndex + "-" + Long.toString(sequence); sequence++; return sequenceId; } in MR, we initialize PigContants.TASK_INDEX in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce.Reduce#setup protected void setup(Context context) throws IOException, InterruptedException { ... context.getConfiguration().set(PigConstants.TASK_INDEX, Integer.toString(context.getTaskAttemptID().getTaskID().getId())); ... } But spark does not provide funtion like PigGenericMapReduce.Reduce#setup to initialize PigContants.TASK_INDEX when job starts. Suggest to file a new jira(Initialize PigContants.TASK_INDEX when spark job starts) and skip this unit test until this jira is resolved. > On May 22, 2016, 9:57 p.m., Rohini Palaniswamy wrote: > > src/org/apache/pig/data/SelfSpillBag.java, line 32 > > <https://reviews.apache.org/r/45667/diff/1/?file=1323848#file1323848 > > line32> > > > > Why is bag even being serialized by Spark? SelfSpillBag is used in TestHBaseStorage, if not mark it transient, NotSerializableExecption is thrown out > On May 22, 2016, 9:57 p.m., Rohini Palaniswamy wrote: > > test/org/apache/pig/newplan/logical/relational/TestLocationInPhysica > > lPlan.java, line 66 > > <https://reviews.apache.org/r/45667/diff/1/?file=1323865#file1323865 > > line66> > > > > Why does A[3,4] repeat? The pig script is like: LOAD '" + Util.encodeEscape(input.getAbsolutePath()) + "' using PigStorage();\n" "B = GROUP A BY $0;\n" "A = FOREACH B GENERATE COUNT(A);\n" "STORE A INTO '" + Util.encodeEscape(output.getAbsolutePath()) + "';"); The spark plan is : A: Store(/tmp/pig_junit_tmp1755582848/test6087259092054964214output:org.apache.pig.builtin.PigStorage) - scope-9 | |---A: New For Each(false)[tuple] - scope-13 | | | Project[bag][1] - scope-11 | | POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - scope-12 | | | |---Project[bag][1] - scope-28 | |---Reduce By(false,false)[tuple] - scope-18 | | | Project[bytearray][0] - scope-19 | | | POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - scope-20 | | | |---Project[bag][1] - scope-21 | |---B: Local Rearrange[tuple]{bytearray}(false) - scope-24 | | | Project[bytearray][0] - scope-26 | |---A: New For Each(false,false)[bag] - scope-14 | | | Project[bytearray][0] - scope-15 | | | POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - scope-16 | | | |---Project[bag][1] - scope-17 | |---Pre Combiner Local Rearrange[tuple]{Unknown} - scope-27 | |---A: Load(/tmp/pig_junit_tmp1755582848/test7108242581632795697input:PigStorage) - scope-0 There are two ForEach (scope-13 and scope-14) in the sparkplan so A[3,4] appears twice. Comparing with MR plan: #-- # Map Reduce Plan #-- MapReduce node scope-10 Map Plan B: Local Rearrange[tuple]{bytearray}(false) - scope-22 | | | Project[bytearray][0] - scope-24 | |---A: New For Each(false,false)[bag] - scope-11 | | | Project[bytearray][0] - scope-12 | | | POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - scope-13 | | | |---Project[bag][1] - scope-14 | |---Pre Combiner Local Rearrange[tuple]{Unknown} - scope-25 | |---A: Load(/tmp/pig_junit_tmp910232853/test2548400580131197161inpu
RE: Welcome our new Pig PMC chair Daniel Dai
Congrats Daniel! Best Regards Kelly Zhang/Zhang,Liyun -Original Message- From: Praveen R [mailto:prav...@sigmoidanalytics.com] Sent: Sunday, March 27, 2016 3:47 AM To: dev@pig.apache.org Subject: Re: Welcome our new Pig PMC chair Daniel Dai Congrats Daniel. On Fri, 25 Mar 2016, 8:53 p.m. Lorand Bendig, <lben...@gmail.com> wrote: > Congratulations, Daniel! > -Lorand > > 2016-03-23 23:23 GMT+01:00 Rohini Palaniswamy <rohini.adi...@gmail.com>: > > > Hi folks, > > I am very happy to announce that we elected Daniel Dai as our > > new Pig PMC Chair and it is official now. Please join me in > > congratulating > Daniel. > > > > Regards, > > Rohini > > >
RE: Welcome to our new Pig PMC member Xuefu Zhang
Congratulations Xuefu! Kelly Zhang/Zhang,Liyun Best Regards -Original Message- From: Jarek Jarcec Cecho [mailto:jar...@gmail.com] On Behalf Of Jarek Jarcec Cecho Sent: Thursday, February 25, 2016 6:36 AM To: dev@pig.apache.org Cc: u...@pig.apache.org Subject: Re: Welcome to our new Pig PMC member Xuefu Zhang Congratulations Xuefu! Jarcec > On Feb 24, 2016, at 1:29 PM, Rohini Palaniswamy <rohini.adi...@gmail.com> > wrote: > > It is my pleasure to announce that Xuefu Zhang is our newest addition > to the Pig PMC. Xuefu is a long time committer of Pig and has been > actively involved in driving the Pig on Spark effort for the past year. > > Please join me in congratulating Xuefu !!! > > Regards, > Rohini
a new mr operator should be generated when POSplit is encounted?
:org.apache.pig.impl.io.InterStorage) - scope-59 | |---C: Package(Packager)[tuple]{int} - scope-16 Global sort: false MapReduce node scope-64 Map Plan Union[tuple] - scope-65 | |---H: Local Rearrange[tuple]{int}(false) - scope-50 | | | | | Project[int][0] - scope-51 | | | |---D: New For Each(false,false)[bag] - scope-32 | | | | | Project[int][0] - scope-22 | | | | | POUserFunc(myudfs.AccumulativeSumBag)[chararray] - scope-25 | | | | | |---RelationToExpressionProject[bag][*] - scope-24 | | | | | |---F: New For Each(false)[bag] - scope-31 | | | | | | | Project[bytearray][1] - scope-29 | | | | | |---E: POSort[bag]() - scope-28 | | | | | | | Project[bytearray][1] - scope-27 | | | | | |---Project[bag][1] - scope-26 | | | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp523129898/tmp344582360:org.apache.pig.impl.io.InterStorage) - scope-60 | |---H: Local Rearrange[tuple]{int}(false) - scope-52 | | | Project[int][0] - scope-53 | |---G: New For Each(false,false)[bag] - scope-45 | | | Project[int][0] - scope-35 | | | POUserFunc(myudfs.AccumulativeSumBag)[chararray] - scope-38 | | | |---RelationToExpressionProject[bag][*] - scope-37 | | | |---F: New For Each(false)[bag] - scope-44 | | | | | Project[bytearray][1] - scope-42 | | | |---E: POSort[bag]() - scope-41 | | | | | Project[bytearray][1] - scope-40 | | | |---Project[bag][1] - scope-39 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp523129898/tmp344582360:org.apache.pig.impl.io.InterStorage) - scope-62 Reduce Plan H: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-57 | |---H: Package(JoinPackager(true,true))[tuple]{int} - scope-49 Global sort: false Kelly Zhang/Zhang,Liyun Best Regards
I can not upload patch from https://issues.apache.org/jira/xxx now. Anyone knows why?
Hi all: I met some problem when I upload patch: When I choose my patch and upload, then submit Attach button, it shows Please indicate the file you wish to upload and upload fails.Anyone knows why? [cid:image001.png@01D08744.D740D2E0] [cid:image002.png@01D08744.D740D2E0] Kelly Zhang/Zhang,Liyun Best Regards
a question about TestSkewedJoin# testSkewedJoinKeyPartition
Hi all: I want to ask a question about TestSkewedJoin# testSkewedJoinKeyPartition: @Test public void testSkewedJoinKeyPartition() throws IOException { String outputDir = testSkewedJoinKeyPartition; try{ Util.deleteFile(cluster, outputDir); }catch(Exception e){ // it is ok if directory not exist } pigServer.registerQuery(A = LOAD ' + INPUT_FILE1 + ' as (id, name, n);); pigServer.registerQuery(B = LOAD ' + INPUT_FILE2 + ' as (id, name);); pigServer.registerQuery(E = join A by id, B by id using 'skewed' parallel 7;); pigServer.store(E, outputDir); int[][] lineCount = new int[3][7]; FileStatus[] outputFiles = fs.listStatus(new Path(outputDir), Util.getSuccessMarkerPathFilter()); // check how many times a key appear in each part- file for (int i=0; i7; i++) { String filename = outputFiles[i].getPath().toString(); Util.copyFromClusterToLocal(cluster, filename, OUTPUT_DIR + / + i); BufferedReader reader = new BufferedReader(new FileReader(OUTPUT_DIR + / + i)); String line = null; while((line = reader.readLine()) != null) { String[] cols = line.split(\t); int key = Integer.parseInt(cols[0])/100 -1; lineCount[key][i] ++; } reader.close(); } int fc = 0; for(int i=0; i3; i++) { for(int j=0; j7; j++) { if (lineCount[i][j] 0) { fc ++; } } } // atleast one key should be a skewed key // check atleast one key should appear in more than 1 part- file assertTrue(fc 3); } When I run this unit test , I found the result is in OUTPUT_DIR/0 ~OUTPUT_DIR/6( because the parallel number is 7). One key appears in more 1 part-file. But when I the script in command, I found the result in OUTPUT_DIR/part-2, OUTPUT_DIR/part-4, OUTPUT_DIR/part-6. Other part-x is empty. One key only appears in 1 part-file. A = LOAD './SkewedJoinInput1.txt' as (id, name, n); B = LOAD './SkewedJoinInput2.txt' as (id, name); E = join A by id, B by id using 'skewed' parallel 7; store E into './testSkewedJoin.out'; I don't understand why have different results when running in unit test environment and running in command directly? I'm appreciated if anyone can give me some suggestions. Kelly Zhang/Zhang,Liyun Best Regards
a question about MRCompiler#visitSort
Hi all, I want to ask a question about following script: testlimit.pig a = load './testlimit.txt' as (x:int, y:chararray); b = order a by x; c = limit b 1; store c into './testlimit.out'; In MR:it will generate 4 MapReduce node(scope-11, scope-14, scope-29,scope-40) scope-11: load the input data and store it to a tmp file scope-14: sampleload the tmp file and generate the quantile file: hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp300898425. I think the quantile file contains the instance of WeightedRangePartitioner which shows how keys distribute. scope-29: use the quantile file to sort. My question here: WeightedRangePartitioner only shows how key distribute and makes every reduce receive equal data from map. But this can gurantee sort? #-- # Map Reduce Plan #-- MapReduce node scope-11 Map Plan Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp694083214:org.apache.pig.impl.io.InterStorage) - scope-12 | |---a: New For Each(false,false)[bag] - scope-7 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/testlimit.txt:org.apache.pig.builtin.PigStorage) - scope-0 Global sort: false MapReduce node scope-14 Map Plan b: Local Rearrange[tuple]{tuple}(false) - scope-18 | | | Constant(all) - scope-17 | |---New For Each(false)[tuple] - scope-16 | | | Project[int][0] - scope-15 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp694083214:org.apache.pig.impl.builtin.RandomSampleLoader('org.apache.pig.impl.io.InterStorage','100')) - scope-13 Reduce Plan Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp300898425:org.apache.pig.impl.io.InterStorage) - scope-27 | |---New For Each(false)[tuple] - scope-26 | | | POUserFunc(org.apache.pig.impl.builtin.FindQuantiles)[tuple] - scope-25 | | | |---Project[tuple][*] - scope-24 | |---New For Each(false,false)[tuple] - scope-23 | | | Constant(2) - scope-22 | | | Project[bag][1] - scope-20 | |---Package(Packager)[tuple]{chararray} - scope-19 Global sort: false Secondary sort: true MapReduce node scope-29 Map Plan b: Local Rearrange[tuple]{int}(false) - scope-30 | | | Project[int][0] - scope-8 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp694083214:org.apache.pig.impl.io.InterStorage) - scope-28 Combine Plan Local Rearrange[tuple]{int}(false) - scope-35 | | | Project[int][0] - scope-8 | |---Limit - scope-34 | |---New For Each(true)[tuple] - scope-33 | | | Project[bag][1] - scope-32 | |---Package(LitePackager)[tuple]{int} - scope-31 Reduce Plan c: Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp538566422:org.apache.pig.impl.io.InterStorage) - scope-10 | |---Limit - scope-39 | |---New For Each(true)[tuple] - scope-38 | | | Project[bag][1] - scope-37 | |---Package(LitePackager)[tuple]{int} - scope-36 Global sort: true Quantile file: hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp300898425 MapReduce node scope-40 Map Plan b: Local Rearrange[tuple]{int}(false) - scope-42 | | | Project[int][0] - scope-43 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp538566422:org.apache.pig.impl.io.InterStorage) - scope-41 Reduce Plan c: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-49 | |---Limit - scope-48 | |---New For Each(true)[bag] - scope-47 | | | Project[tuple][1] - scope-46 | |---Package(LitePackager)[tuple]{int} - scope-45 Global sort: false Kelly Zhang/Zhang,Liyun Best Regards
RE: Welcome our new Pig PMC chair Rohini Palaniswamy
Congratulations, Rohini! Best regards Kelly Zhang/Zhang,Liyun -Original Message- From: Xuefu Zhang [mailto:xzh...@cloudera.com] Sent: Thursday, March 19, 2015 10:29 AM To: dev@pig.apache.org Cc: u...@pig.apache.org Subject: Re: Welcome our new Pig PMC chair Rohini Palaniswamy Congratulations, Rohini! --Xuefu On Wed, Mar 18, 2015 at 6:48 PM, Cheolsoo Park cheol...@apache.org wrote: Hi all, Now it's official that Rohini Palaniswamy is our new Pig PMC chair. Please join me in congratulating Rohini for her new role. Congrats! Thanks! Cheolsoo
a question about MRCompiler#visitSort
://zly1.sh.intel.com:8020/tmp/temp2146669591/tmp538566422:org.apache.pig.impl.io.InterStorage) - scope-41 Reduce Plan c: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-49 | |---Limit - scope-48 | |---New For Each(true)[bag] - scope-47 | | | Project[tuple][1] - scope-46 | |---Package(LitePackager)[tuple]{int} - scope-45 Global sort: false Best regards Zhang,Liyun
RE: a question about TestAccumulator#testAccumBasic
Hi Daniel: Thanks for your comment and I have figured out why org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator#isAccumulative() is false in second case. Best regards Zhang,Liyun -Original Message- From: Daniel Dai [mailto:da...@hortonworks.com] Sent: Wednesday, March 04, 2015 2:39 AM To: dev@pig.apache.org Subject: Re: a question about TestAccumulator#testAccumBasic In first case Accumulator is get used and the second case is not (believe due to the second UDF BagCount in the plan). There are certain conditions whether Accumulator can be used in the plan or not, the logic is in AccumulatorOptimizer. Daniel On 3/3/15, 12:40 AM, Zhang, Liyun liyun.zh...@intel.com wrote: Hi Daniel: Thanks for your reply! According to your comment: The first case will use Accumulator, so accumulate - cleanup will be called, but no exec. The second case will not use Accumulator, exec will be called instead of accumulate - cleanup. I guess what you mean is that If org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOpe rat or#isAccumulative() is true, it will execute ((Accumulator)func).accumulate((Tuple)result.result); while org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOpe rat or#isAccumulative() is false, it will execute func.exec((Tuple) result.result); but I think the second case also use AccumulatorBagCount. The difference between the first and second case is second case use org.apache.pig.test.utils.BagCount while the first one are not. The first pig script: (TestAccumulator line 151~154) A = load ' + INPUT_FILE1 + ' as (id:int, fruit); B = group A by id; C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A); The second script: (TestAccumulator line 169~171) A = load ' + INPUT_FILE1 + ' as (id:int, fruit); B = group A by id; C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A), org.apache.pig.test.utils.BagCount(A); Best regards Zhang,Liyun -Original Message- From: Daniel Dai [mailto:da...@hortonworks.com] Sent: Tuesday, March 03, 2015 3:45 AM To: dev@pig.apache.org Subject: Re: a question about TestAccumulator#testAccumBasic The first case will use Accumulator, so accumulate - cleanup will be called, but no exec. The second case will not use Accumulator, exec will be called instead of accumulate - cleanup. Daniel On 3/1/15, 7:21 PM, Zhang, Liyun liyun.zh...@intel.com wrote: Hi all: I have a question about TestAccumulator#testAccumBasic. The first pig script: (TestAccumulator line 151~154) A = load ' + INPUT_FILE1 + ' as (id:int, fruit); B = group A by id; C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A); It uses org.apache.pig.test.utils.AccumulatorBagCount, in org.apache.pig.test.utils.AccumulatorBagCount#exec org.apache.pig.test.utils.AccumulatorBagCount#exec public Integer exec(Tuple tuple) throws IOException { throw new IOException(exec() should not be called.); } My question:It should throw exception when script is excuted but why not throw exception? The second script: (TestAccumulator line 169~171) A = load ' + INPUT_FILE1 + ' as (id:int, fruit); B = group A by id; C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A), org.apache.pig.test.utils.BagCount(A); It uses , org.apache.pig.test.utils.AccumulatorBagCount and ), org.apache.pig.test.utils.BagCount. The code checks whether if it throws exception, if not throw exception, the unit test fails. TestAccumulator#testAccumBasic @Test public void testAccumBasic() throws IOException{ 151// test group by 152pigServer.registerQuery(A = load ' + INPUT_FILE1 + ' as (id:int, fruit);); 153 pigServer.registerQuery(B = group A by id;); 154pigServer.registerQuery(C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A);); HashMapInteger, Integer expected = new HashMapInteger, Integer(); expected.put(100, 2); expected.put(200, 1); expected.put(300, 3); expected.put(400, 1); IteratorTuple iter = pigServer.openIterator(C); while(iter.hasNext()) { Tuple t = iter.next(); assertEquals(expected.get((Integer)t.get(0)), (Integer)t.get(1)); } 169pigServer.registerQuery(B = group A by id;); 170 pigServer.registerQuery(C = foreach B generate group, + org.apache.pig.test.utils.AccumulatorBagCount(A), org.apache.pig.test.utils.BagCount(A);); try{ iter = pigServer.openIterator(C); while(iter.hasNext()) { Tuple t = iter.next(); assertEquals(expected.get((Integer)t.get(0)), (Integer)t.get(1)); } fail(accumulator should not be called.); }catch(IOException e) { // should throw exception from AccumulatorBagCount
RE: a question about TestAccumulator#testAccumBasic
Hi Daniel: Thanks for your reply! According to your comment: The first case will use Accumulator, so accumulate - cleanup will be called, but no exec. The second case will not use Accumulator, exec will be called instead of accumulate - cleanup. I guess what you mean is that If org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator#isAccumulative() is true, it will execute ((Accumulator)func).accumulate((Tuple)result.result); while org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator#isAccumulative() is false, it will execute func.exec((Tuple) result.result); but I think the second case also use AccumulatorBagCount. The difference between the first and second case is second case use org.apache.pig.test.utils.BagCount while the first one are not. The first pig script: (TestAccumulator line 151~154) A = load ' + INPUT_FILE1 + ' as (id:int, fruit); B = group A by id; C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A); The second script: (TestAccumulator line 169~171) A = load ' + INPUT_FILE1 + ' as (id:int, fruit); B = group A by id; C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A), org.apache.pig.test.utils.BagCount(A); Best regards Zhang,Liyun -Original Message- From: Daniel Dai [mailto:da...@hortonworks.com] Sent: Tuesday, March 03, 2015 3:45 AM To: dev@pig.apache.org Subject: Re: a question about TestAccumulator#testAccumBasic The first case will use Accumulator, so accumulate - cleanup will be called, but no exec. The second case will not use Accumulator, exec will be called instead of accumulate - cleanup. Daniel On 3/1/15, 7:21 PM, Zhang, Liyun liyun.zh...@intel.com wrote: Hi all: I have a question about TestAccumulator#testAccumBasic. The first pig script: (TestAccumulator line 151~154) A = load ' + INPUT_FILE1 + ' as (id:int, fruit); B = group A by id; C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A); It uses org.apache.pig.test.utils.AccumulatorBagCount, in org.apache.pig.test.utils.AccumulatorBagCount#exec org.apache.pig.test.utils.AccumulatorBagCount#exec public Integer exec(Tuple tuple) throws IOException { throw new IOException(exec() should not be called.); } My question:It should throw exception when script is excuted but why not throw exception? The second script: (TestAccumulator line 169~171) A = load ' + INPUT_FILE1 + ' as (id:int, fruit); B = group A by id; C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A), org.apache.pig.test.utils.BagCount(A); It uses , org.apache.pig.test.utils.AccumulatorBagCount and ), org.apache.pig.test.utils.BagCount. The code checks whether if it throws exception, if not throw exception, the unit test fails. TestAccumulator#testAccumBasic @Test public void testAccumBasic() throws IOException{ 151// test group by 152pigServer.registerQuery(A = load ' + INPUT_FILE1 + ' as (id:int, fruit);); 153 pigServer.registerQuery(B = group A by id;); 154pigServer.registerQuery(C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A);); HashMapInteger, Integer expected = new HashMapInteger, Integer(); expected.put(100, 2); expected.put(200, 1); expected.put(300, 3); expected.put(400, 1); IteratorTuple iter = pigServer.openIterator(C); while(iter.hasNext()) { Tuple t = iter.next(); assertEquals(expected.get((Integer)t.get(0)), (Integer)t.get(1)); } 169pigServer.registerQuery(B = group A by id;); 170 pigServer.registerQuery(C = foreach B generate group, + org.apache.pig.test.utils.AccumulatorBagCount(A), org.apache.pig.test.utils.BagCount(A);); try{ iter = pigServer.openIterator(C); while(iter.hasNext()) { Tuple t = iter.next(); assertEquals(expected.get((Integer)t.get(0)), (Integer)t.get(1)); } fail(accumulator should not be called.); }catch(IOException e) { // should throw exception from AccumulatorBagCount. } // test cogroup pigServer.registerQuery(A = load ' + INPUT_FILE1 + ' as (id:int, fruit);); pigServer.registerQuery(B = load ' + INPUT_FILE1 + ' as (id:int, fruit);); pigServer.registerQuery(C = cogroup A by id, B by id;); pigServer.registerQuery(D = foreach C generate group, + org.apache.pig.test.utils.AccumulatorBagCount(A), org.apache.pig.test.utils.AccumulatorBagCount(B);); HashMapInteger, String expected2 = new HashMapInteger, String(); expected2.put(100, 2,2); expected2.put(200, 1,1); expected2.put(300, 3,3); expected2.put(400, 1,1); iter = pigServer.openIterator(D); while(iter.hasNext
a question about TestAccumulator#testAccumBasic
Hi all: I have a question about TestAccumulator#testAccumBasic. The first pig script: (TestAccumulator line 151~154) A = load ' + INPUT_FILE1 + ' as (id:int, fruit); B = group A by id; C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A); It uses org.apache.pig.test.utils.AccumulatorBagCount, in org.apache.pig.test.utils.AccumulatorBagCount#exec org.apache.pig.test.utils.AccumulatorBagCount#exec public Integer exec(Tuple tuple) throws IOException { throw new IOException(exec() should not be called.); } My question:It should throw exception when script is excuted but why not throw exception? The second script: (TestAccumulator line 169~171) A = load ' + INPUT_FILE1 + ' as (id:int, fruit); B = group A by id; C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A), org.apache.pig.test.utils.BagCount(A); It uses , org.apache.pig.test.utils.AccumulatorBagCount and ), org.apache.pig.test.utils.BagCount. The code checks whether if it throws exception, if not throw exception, the unit test fails. TestAccumulator#testAccumBasic @Test public void testAccumBasic() throws IOException{ 151// test group by 152pigServer.registerQuery(A = load ' + INPUT_FILE1 + ' as (id:int, fruit);); 153 pigServer.registerQuery(B = group A by id;); 154pigServer.registerQuery(C = foreach B generate group, org.apache.pig.test.utils.AccumulatorBagCount(A);); HashMapInteger, Integer expected = new HashMapInteger, Integer(); expected.put(100, 2); expected.put(200, 1); expected.put(300, 3); expected.put(400, 1); IteratorTuple iter = pigServer.openIterator(C); while(iter.hasNext()) { Tuple t = iter.next(); assertEquals(expected.get((Integer)t.get(0)), (Integer)t.get(1)); } 169pigServer.registerQuery(B = group A by id;); 170 pigServer.registerQuery(C = foreach B generate group, + org.apache.pig.test.utils.AccumulatorBagCount(A), org.apache.pig.test.utils.BagCount(A);); try{ iter = pigServer.openIterator(C); while(iter.hasNext()) { Tuple t = iter.next(); assertEquals(expected.get((Integer)t.get(0)), (Integer)t.get(1)); } fail(accumulator should not be called.); }catch(IOException e) { // should throw exception from AccumulatorBagCount. } // test cogroup pigServer.registerQuery(A = load ' + INPUT_FILE1 + ' as (id:int, fruit);); pigServer.registerQuery(B = load ' + INPUT_FILE1 + ' as (id:int, fruit);); pigServer.registerQuery(C = cogroup A by id, B by id;); pigServer.registerQuery(D = foreach C generate group, + org.apache.pig.test.utils.AccumulatorBagCount(A), org.apache.pig.test.utils.AccumulatorBagCount(B);); HashMapInteger, String expected2 = new HashMapInteger, String(); expected2.put(100, 2,2); expected2.put(200, 1,1); expected2.put(300, 3,3); expected2.put(400, 1,1); iter = pigServer.openIterator(D); while(iter.hasNext()) { Tuple t = iter.next(); assertEquals(expected2.get((Integer)t.get(0)), t.get(1).toString()+,+t.get(2).toString()); } } Can anyone help me solving my question? Best regards Zhang,Liyun
RE: how to test NativeMapReduceOper?
Thanks for your suggestions! From: Rohini Palaniswamy [rohini.adi...@gmail.com] Sent: Tuesday, February 17, 2015 12:10 PM To: dev@pig.apache.org Cc: pig-...@hadoop.apache.org Subject: Re: how to test NativeMapReduceOper? nightly.conf - Native group tests. On Sat, Feb 14, 2015 at 10:02 PM, Zhang, Liyun liyun.zh...@intel.com wrote: Hi all I want to ask a question about how to write a pig script to test NativeMapReduceOper? I approciate if someone can provide a pig script for me. Best regards Zhang,Liyun
how to test NativeMapReduceOper?
Hi all I want to ask a question about how to write a pig script to test NativeMapReduceOper? I approciate if someone can provide a pig script for me. Best regards Zhang,Liyun
How to deal with visitLocalRearrange in spark mode
Hi all: Now i'm working on PIG-4374https://issues.apache.org/jira/browse/PIG-4374(Add SparkPlan in spark package). I met problem in following scripts in spark mode. Join.pig A = load '/SkewedJoinInput1.txt' as (id,name,n); B = load '/SkewedJoinInput2.txt' as (id,name); C = group A by id; D = group B by id; E = join C by group, D by group; store E into '/skewedjoin.out'; explain E; The physical plan will change to a mr plan which contains 3 mapreduce nodes (see attached mr_join.txt) logroup will converts to poLocalRearrange,poGlobalRearrange, poPackage lojoin will converts to poLocalRearrange,poGlobalRearrange, poPackage,poPackage in mapreduce mode, In MapReduceOper, there is mapplan, reduceplan, combineplan. org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitLocalRearrange org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.addToMap private void addToMap(PhysicalOperator op) throws PlanException, IOException{ if (compiledInputs.length == 1) { //For speed MapReduceOper mro = compiledInputs[0]; if (!mro.isMapDone()) { mro.mapPlan.addAsLeaf(op); } else if (mro.isMapDone() !mro.isReduceDone()) { FileSpec fSpec = getTempFileSpec(); POStore st = getStore(); // MyComment: It will first add a POStore in mro.reducePlan and store the mro result in a tmp file. // Then create a new MROper which contains a poload which loads previous tmp file st.setSFile(fSpec); mro.reducePlan.addAsLeaf(st); mro.setReduceDone(true); mro = startNew(fSpec, mro); mro.mapPlan.addAsLeaf(op); compiledInputs[0] = mro; } else { int errCode = 2022; String msg = Both map and reduce phases have been done. This is unexpected while compiling.; throw new PlanException(msg, errCode, PigException.BUG); } curMROp = mro; } In SparkOper I created, there is only plan. How can i deal with the situation i mentioned above? Now I use following ways to deal with: org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkCompiler.visitLocalRearrange org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkCompiler.addToMap private void addToMap(POLocalRearrange op) throws PlanException, IOException { if (compiledInputs.length == 1) { SparkOper sparkOp = compiledInputs[0]; ListPhysicalOperator preds = plan.getPredecessors(op); //MyComment: It will first search the predecessor of POLocalRearrange, if( preds!=null preds.size() 0 preds.size() == 1){ if(!( preds.get(0) instanceof POLoad) ){ // If predecessor is not a poload(usually the precessor of polocalrearrange is poload when using group, join) FileSpec fSpec = getTempFileSpec();//it will add a POStore in sparkOper.plan and store the sparkOper result in a tmp file POStore st = getStore(); // Then create a new SparkOper which contains a poload which loads previous tmp file st.setSFile(fSpec); sparkOp.plan.addAsLeaf(st); sparkOp = startNew(fSpec, sparkOp); compiledInputs[0] = sparkOp; } } sparkOp.plan.addAsLeaf(op); curSparkOp = sparkOp; } else { } . } Can anyone tell me how tez deal with this situation, I want to reference something from other execution mode like mapreduce, tez. Best regards Zhang,Liyun A = load '/SkewedJoinInput1.txt' as (id,name,n); B = load '/SkewedJoinInput2.txt' as (id,name); C = group A by id; D = group B by id; E = join C by group, D by group; store E into '/skewedjoin.out'; explain E; #--- # New Logical Plan: #--- E: (Name: LOStore Schema: C::group#10:bytearray,C::A#23:bag{#29:tuple(id#10:bytearray,name#11:bytearray,n#12:bytearray)},D::group#15:bytearray,D::B#25:bag{#30:tuple(id#15:bytearray,name#16:bytearray)}) | |---E: (Name: LOJoin(HASH) Schema: C::group#10:bytearray,C::A#23:bag{#29:tuple(id#10:bytearray,name#11:bytearray,n#12:bytearray)},D::group#15:bytearray,D::B#25:bag{#30:tuple(id#15:bytearray,name#16:bytearray)}) | | | group:(Name: Project Type: bytearray Uid: 10 Input: 0 Column: 0) | | | group:(Name: Project Type: bytearray Uid: 15 Input: 1 Column: 0) | |---C: (Name: LOCogroup Schema: group#10:bytearray,A#23:bag{#29:tuple(id#10:bytearray
Can anyone to give me a sample script that a load has precessors?
Hi all I found that: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler#compile Line 384~433 this code is for dealing with the situation where a load has predecessors. Can anyone to give me a sample script that a load has precessors? else if (predecessors != null predecessors.size() 0) { // When processing an entire script (multiquery), we can // get into a situation where a load has // predecessors. This means that it depends on some store // earlier in the plan. We need to take that dependency // and connect the respective MR operators, while at the // same time removing the connection between the Physical // operators. That way the jobs will run in the right // order. if (op instanceof POLoad) { if (predecessors.size() != 1) { int errCode = 2125; String msg = Expected at most one predecessor of load. Got +predecessors.size(); throw new PlanException(msg, errCode, PigException.BUG); } PhysicalOperator p = predecessors.get(0); MapReduceOper oper = null; if(p instanceof POStore || p instanceof PONative){ oper = phyToMROpMap.get(p); }else{ int errCode = 2126; String msg = Predecessor of load should be a store or mapreduce operator. Got +p.getClass(); throw new PlanException(msg, errCode, PigException.BUG); } // Need new operator curMROp = getMROp(); curMROp.mapPlan.add(op); MRPlan.add(curMROp); plan.disconnect(op, p); MRPlan.connect(oper, curMROp); phyToMROpMap.put(op, curMROp); return; } Collections.sort(predecessors); compiledInputs = new MapReduceOper[predecessors.size()]; int i = -1; for (PhysicalOperator pred : predecessors) { if(pred instanceof POSplit splitsSeen.containsKey(pred.getOperatorKey())){ compiledInputs[++i] = startNew(((POSplit)pred).getSplitStore(), splitsSeen.get(pred.getOperatorKey())); continue; } compile(pred); compiledInputs[++i] = curMROp; } } else { Best Regards Zhang,Liyun
RE: Re:Re: Is ther a way to run one test of special unit test?
Hi: If you want to run one unittest, you can use following command: ant -Dtestcase=$TestFileName -Dexectype=$mode -DdebugPort=$debugPort test $mode can be : MR, tez, spark Best regards Zhang,Liyun -Original Message- From: lulynn_2008 [mailto:lulynn_2...@163.com] Sent: Monday, January 19, 2015 11:57 AM To: u...@pig.apache.org Cc: dev@pig.apache.org Subject: Re:Re: Is ther a way to run one test of special unit test? Thanks for reply. But Pig is using ant. At 2015-01-16 15:06:51, Pradeep Gollakota pradeep...@gmail.com wrote: If you're using maven AND using surefire plugin 2.7.3+ AND using Junit 4, then you can do this by specifying -Dtest=TestClass#methodName ref: http://maven.apache.org/surefire/maven-surefire-plugin/examples/single- test.html On Thu, Jan 15, 2015 at 8:02 PM, Cheolsoo Park piaozhe...@gmail.com wrote: I don't think you can disable test cases on the fly in JUnit. You will need to add @Ignore annotation and recompile the test file. Correct me if I am wrong. On Thu, Jan 15, 2015 at 6:55 PM, lulynn_2008 lulynn_2...@163.com wrote: Hi All, There are multiple tests in one Test* file. Is there a way to just run only one pointed test? Thanks
RE: add data into hive table in pig
Hi: In http://zh.hortonworks.com/community/forums/topic/problems-using-hcatatalog-with-pig-java-lang-nosuchmethoderror-org-apache-hadoo/, some one has the similar problem as you. Maybe you can find your solution here. Best regards Zhang,Liyun -Original Message- From: 李运田 [mailto:cumt...@163.com] Sent: Thursday, January 15, 2015 10:59 AM To: dev@pig.apache.org; user Subject: add data into hive table in pig pigServer.registerQuery(a = load ' + inputTable+ ' using org.apache.hcatalog.pig.HCatLoader();); pigServer.registerQuery(store a into '' using org.apache.hcatalog.pig.HCatStorer();); but,I always get error like this: Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.plan.TableDesc.init(Ljava/lang/Class;Ljava/lang/Class;Ljava/lang/Class;Ljava/util/Properties;)V at org.apache.hcatalog.common.HCatUtil.configureOutputStorageHandler(HCatUtil.java:481) at org.apache.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:181) at org.apache.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:65) at org.apache.hcatalog.pig.HCatStorer.setStoreLocation(HCatStorer.java:125) please give me some advices,thanks.
about how to implement ship in other mode like spark
Hi all, I want to ask a question about ship in pig: Ship with streaming, it will send streaming binary and supporting files, if any, from the client node to the compute nodes. I found that the implementation of ship in Mapreduce mode is: /home/zly/prj/oss/pig/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java line 721: setupDistributedCache(pigContext, conf, pigContext.getProperties(), pig.streaming.ship.files, true); this function gets all pig.streaming.ship.files from the properties, then copy the ship files to hadoop using fs.copyFromLocalFile, at the same time, symlink feature is turned on by using DistributedCache.createSymlink(conf). For example, if ship file /tmp/teststreaming.pl is copyed from local to hadoop, the hadoop file will be hdfs://:8020/tmp/temp/tmp-xxx#teststreaming.pl. /tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 is a cache for hdfs://:8020/tmp/temp/tmp-xxx#teststreaming.pl . teststreaming.pl will be generated as a link to /tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 in the current execution path. If i want to implement ship in other mode like spark, the only thing i need to do is copying the shiped files from the shiped path to current execution path? Best regards Zhang,Liyun
RE: use pig in eclipse
Hi 李运田: Following error is in your mail: java.lang.NoSuchFieldException: runnerState at java.lang.Class.getDeclaredField(Class.java:1948) Are you use hadoop 2 while compiling pig with hadooop 1? How to compile pig with hadoop2: ant -Dhadoopversion=23 jar -Original Message- From: 李运田 [mailto:cumt...@163.com] Sent: Tuesday, December 30, 2014 5:39 PM To: dev@pig.apache.org; user Subject: use pig in eclipse my eclipse and pig are in same linux. this is my pig configuration in eclipse: props.setProperty(fs.defaultFS, hdfs://10.210.90.101:8020); props.setProperty(hadoop.job.user, hadoop); props.setProperty(mapreduce.framework.name, yarn); props.setProperty(yarn.resourcemanager.hostname, 10.210.90.101); props.setProperty(yarn.resourcemanager.admin.address, 10.210.90.101:8141); props.setProperty(yarn.resourcemanager.address, 10.210.90.101:8050); props.setProperty(yarn.resourcemanager.resource-tracker.address, 10.210.90.101:8025); props.setProperty(yarn.resourcemanager.scheduler.address, 10.210.90.101:8030); I have added core-site.xml、 yarn-site.xml、。into eclipse project. I can run pig script in grunt but,when I run pigServer = new PigServer( ExecType.MAPREDUCE, props); pigServer.registerQuery(tmp= LOAD '/user/hadoop/aa.txt';); pigServer.registerQuery(tmp_table_limit = order tmp by $0;); pigServer.store(tmp_table_limit, /user/hadoop/shi.txt); I always get error: 14/12/30 17:28:33 WARN hadoop20.PigJobControl: falling back to default JobControl (not using hadoop 0.20 ?) java.lang.NoSuchFieldException: runnerState at java.lang.Class.getDeclaredField(Class.java:1948) at org.apache.pig.backend.hadoop20.PigJobControl.clinit(PigJobControl.java:51) help!!
about how to implement ship in other mode like spark
Hi all, I want to ask a question about ship in pig: Ship with streaming, it will send streaming binary and supporting files, if any, from the client node to the compute nodes. I found that the implementation of ship in Mapreduce mode is: /home/zly/prj/oss/pig/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java line 721: setupDistributedCache(pigContext, conf, pigContext.getProperties(), pig.streaming.ship.files, true); this function gets all pig.streaming.ship.files from the properties, then copy the ship files to hadoop using fs.copyFromLocalFile, at the same time, symlink feature is turned on by using DistributedCache.createSymlink(conf). For example, if ship file /tmp/teststreaming.pl is copyed from local to hadoop, the hadoop file will be hdfs://:8020/tmp/temp/tmp-xxx#teststreaming.pl. /tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 is a cache for hdfs://:8020/tmp/temp/tmp-xxx#teststreaming.pl . teststreaming.pl will be generated as a link to /tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 in the current execution path. If i want to implement ship in other mode like spark, the only thing i need to do is copying the shiped files from the shiped path to current execution path? Best regards Zhang,Liyun
some question about streaming_local.conf
Hi all: I have 2 questiones about pig/test/e2e/pig/tests/streaming_local.conf: 1. $cfg = { 'driver' = 'Pig', 'nummachines' = 5, 'groups' = [ { # This group is for local mode testing 'name' = 'StreamingLocal', 'sortBenchmark' = 1, 'sortResults' = 1, 'floatpostprocess' = 1, 'delimiter' = ' ', 'tests' = [ { #Section 1.1: perl script, no parameters 'num' = 1, 'execonly' = 'local', // this line 'pig' = q# all e2e test cases are only executed in local mode now. Can these e2e tests run in other mode, like mapreduce,tez,spark? when i replace 'execonly'='local' with 'execonly'='spark', all cases pass when POStream is implemented in spark mode. I think we can remove 'execonly'='local' and can test these e2e tests in other modes. 2. when using ship with streaming, it will send streaming binary and supporting files, if any, from the client node to the compute nodes..I found we use perl ./libexec/GroupBy.pl in StreamingLocal_3.pig, this path is a relative path to current executed path. can we use perl GroupBy.pl because i think the file ./libexec/GroupBy.pl has been shipped to compute nodes. StreamingLocal_3.pig /test/e2e/pigMD `perl ./libexec/GroupBy.pl '\t' 0` ship('./libexec/GroupBy.pl'); A = load './data/singlefile/studenttab10k'; B = group A by $0; C = foreach B generate flatten(A); D = stream C through CMD; store D into './testout/root-1419582821-streaming_local.conf/StreamingLocal_3.out'; Best regards Zhang,Liyun
RE: Is there any way to guarantee the sequence of group field as the input when using group operator in pig
Hi Remi: Thanks for your reply. I agree that group makes no guarantee by contract. The sequence of result is not same as the input. So we need make some changes in org.apache.pig.test.TestForEachNestedPlan.testInnerDistinct() and org.apache.pig.test.TestForEachNestedPlan.testInnerOrderByAliasReuse() . Because in those two functions, it judges the result of group according to the input sequence. I have submitted PIG-4282_1.patch. Can anyone help review? Very thanks TestForEachNestedPlan.testInnerDistinct() Line219: ListTuple expectedResults = Util.getTuplesFromConstantTupleStrings( new String[] {(10,68), (20,78)}); int counter = 0; while (iter.hasNext()) { // judges the result of group according to the input sequence assertEquals(expectedResults.get(counter++).toString(), iter.next().toString()); } assertEquals(expectedResults.size(), counter); Best Regards Zhang,Liyun -Original Message- From: remi.catheri...@orange.com [mailto:remi.catheri...@orange.com] Sent: Thursday, December 18, 2014 10:56 PM To: dev@pig.apache.org Subject: RE: Is there any way to guarantee the sequence of group field as the input when using group operator in pig Hi all, If you need any kind of ordering in the output you use on the sort operator. It was designed for such needs. The fact that different engines produce differently ordered groups is due to each engine specific optimizations. If you ask PIG to re-order the groups you just remove any benefit of those optimization. I would rather keep groups the way it is because I know I could rely on sort if I need and pay its price or have the best speed if I don't need any specific ordering. My conclusion is : group makes no guarantee by contract, so this is neither a problem nor a bug. It is a misuse of group compared to sort Regards, Remi -Message d'origine- De : Zhang, Liyun [mailto:liyun.zh...@intel.com] Envoyé : jeudi 18 décembre 2014 07:38 À : pig-...@hadoop.apache.org Objet : Is there any way to guarantee the sequence of group field as the input when using group operator in pig Hi all, I met a problem that group operator has different results in different engines like spark and mapreduce(PIG-4282https://issues.apache.org/jira/browse/PIG-4282). groupdistinct.pig A = load 'input1.txt' as (age:int,gpa:int); B = group A by age; C = foreach B { D = A.gpa; E = distinct D; generate group, MIN(E); }; dump C; input1.txt is: 10 89 20 78 10 68 10 89 20 92 the mapreduce output is: (10,68),(20,78) the spark output is (20,78),(10,68) These two results are different, because the sequence of field 'group' is not same. Is there any way to guarantee the sequence of group field as the input when using group operator in pig? Best regards Zhang,Liyun _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Is there any way to guarantee the sequence of “group” field as the input when using “group” operator in pig
Hi all, I met a problem that “group operator has different results in different engines like spark and mapreduce(PIG-4282https://issues.apache.org/jira/browse/PIG-4282). groupdistinct.pig A = load 'input1.txt' as (age:int,gpa:int); B = group A by age; C = foreach B { D = A.gpa; E = distinct D; generate group, MIN(E); }; dump C; input1.txt is: 10 89 20 78 10 68 10 89 20 92 the mapreduce output is: (10,68),(20,78) the spark output is (20,78),(10,68) These two results are different, because the sequence of field ‘group’ is not same. Is there any way to guarantee the sequence of “group” field as the input when using “group” operator in pig? Best regards Zhang,Liyun
Can anyone help review PIG-4265
Hi all, Can anyone help review PIG-4265: AlgebraicDoubleMathBase has Java double precision problems ? Java Double precision problem will cause invalid result of SUM eval function. The detail steps to reproduce the bug is in PIG-4265https://issues.apache.org/jira/browse/PIG-4265. About Java Double precision problem you can referred https://community.oracle.com/thread/2448849?tstart=0; Best Regards Zhang,Liyun
some problem i found in UDFContext
Hi all: Now I'm fixing PIG-4232. I want to ask question about UDFContext I found when call UDFContext#getUDFProperties, in this function , it will put a new entry which key is UDFContextKey( its class is c) and value is an empty property when the specified property is not found (p= null). public Properties getUDFProperties(Class c) { UDFContextKey k = generateKey(c, null); Properties p = udfConfs.get(k); if (p == null) { p = new Properties(); udfConfs.put(k, p); } return p; } UDFContext#setupUDFContext: before deserializing udfc, it first judges that udfc.isUDFConfEmpty(). I think that this judgement will have problem when UDFContext#getUDFProperties(Class c) is executed first in current thread, then UDFContext#setupUDFContext(Configuration job) is executed. In this situation, UDFContext#udfConfs only has an entry which value is an empty property but udfc.IsUDFConfEmpty returns false. public static void setupUDFContext(Configuration job) throws IOException { UDFContext udfc = UDFContext.getUDFContext(); udfc.addJobConf(job); // don't deserialize in front-end if (udfc.isUDFConfEmpty()) { udfc.deserialize(); } } public boolean isUDFConfEmpty() { return udfConfs.isEmpty(); } I think following way can solve the situation mentioned. public boolean isUDFConfEmpty() { -return udfConfs.isEmpty(); + // return udfConfs.isEmpty(); +if( udfConfs.isEmpty()){ +return true; +}else{ +boolean res = true; +for(UDFContextKey udfContextKey:udfConfs.keySet()){ +if(!udfConfs.get(udfContextKey).isEmpty()){ +res = false; +break; +} +} +return res; +} + } Can anyone tell me in the situation I mentioned, is there any other way to solve? Very thanks Best Regards Zhang,Liyun
RE: need use ant jar -Dhadoopversion=23 to build, then e2e tests works?
Have filed jira PIG-4205 and made patch for it. Please review. Best Regards Zhang,Liyun -Original Message- From: Daniel Dai [mailto:da...@hortonworks.com] Sent: Saturday, September 27, 2014 2:50 AM To: dev@pig.apache.org Subject: Re: need use ant jar -Dhadoopversion=23 to build, then e2e tests works? Most probably your HADOOP_HOME is not defined correctly. Do you have -Dharness.hadoop.home in the command line? On Fri, Sep 26, 2014 at 8:21 AM, Rohini Palaniswamy rohini.adi...@gmail.com wrote: No. Seems like a bug introduced after breaking down the fat jar. Please file a jira. On Thu, Sep 25, 2014 at 10:56 PM, Zhang, Liyun liyun.zh...@intel.com wrote: Hi all, I'm using e2e test of pig( https://cwiki.apache.org/confluence/display/PIG/HowToTest#HowToTest-HowtoRune2eTests). My hadoop env is hadoop1. Because I use hadoop-1 , I use ant jar to build . When I execute following command: ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir -Dharness.cluster.bin=hadoop_script test-e2e-deploy following error is found: 67 Going to run /home/zly/prj/oss/pig/test/e2e/pig/../../../bin/pig -e mkdir /user/pig/out/root-1411632015-nightly.conf/ 168 Cannot locate pig-core-h2.jar. do 'ant -Dhadoopversion=23 jar', and try again It seems that I need use ant jar -Dhadoopversion=23 to build , the test-e2e-deploy can success. Can anyone tell me my understanding is right? Best Regards Zhang,Liyun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
need use ant jar -Dhadoopversion=23 to build, then e2e tests works?
Hi all, I'm using e2e test of pig(https://cwiki.apache.org/confluence/display/PIG/HowToTest#HowToTest-HowtoRune2eTests). My hadoop env is hadoop1. Because I use hadoop-1 , I use ant jar to build . When I execute following command: ant -Dharness.old.pig=old_pig -Dharness.cluster.conf=hadoop_conf_dir -Dharness.cluster.bin=hadoop_script test-e2e-deploy following error is found: 67 Going to run /home/zly/prj/oss/pig/test/e2e/pig/../../../bin/pig -e mkdir /user/pig/out/root-1411632015-nightly.conf/ 168 Cannot locate pig-core-h2.jar. do 'ant -Dhadoopversion=23 jar', and try again It seems that I need use ant jar -Dhadoopversion=23 to build , the test-e2e-deploy can success. Can anyone tell me my understanding is right? Best Regards Zhang,Liyun
RE: [VOTE] Drop support for Hadoop 0.20 from Pig 0.14
+1 Best Regards Zhang,Liyun -Original Message- From: Rohini Palaniswamy [mailto:rohini.adi...@gmail.com] Sent: Wednesday, September 17, 2014 12:38 PM To: dev@pig.apache.org Subject: [VOTE] Drop support for Hadoop 0.20 from Pig 0.14 Hi, Hadoop has matured far from Hadoop 0.20 and has had two major releases after that and there has been no development on branch-0.20 ( http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/) for 3 years now. It is high time we drop support for Hadoop 0.20 and only support Hadoop 1.x and 2.x lines going forward. This will reduce the maintenance effort and also enable us to right more efficient code and cut down on reflections. Vote closes on Tuesday, Sep 23 2014. Thanks, Rohini
RE: [VOTE] Drop support for JDK 6 from Pig 0.14
+1 -Original Message- From: Julien Le Dem [mailto:jul...@twitter.com.INVALID] Sent: Wednesday, September 17, 2014 1:09 PM To: dev@pig.apache.org Subject: Re: [VOTE] Drop support for JDK 6 from Pig 0.14 +1 On Tuesday, September 16, 2014, Rohini Palaniswamy rohini.adi...@gmail.com wrote: Hi, Hadoop is dropping support for JDK6 from hadoop-2.7 this year as mentioned in the mail below. Pig should also move to JDK7 to be able to compile against future hadoop 2.x releases and start making releases with jars (binaries, maven repo) compiled in JDK 7. This would also open it up for developers to code with JDK7 specific APIs. Vote closes on Tuesday, Sep 23 2014. Thanks, Rohini -- Forwarded message -- From: Arun C Murthy a...@hortonworks.com javascript:; Date: Tue, Aug 19, 2014 at 10:52 AM Subject: Dropping support for JDK6 in Apache Hadoop To: d...@hbase.apache.org javascript:; d...@hbase.apache.org javascript:;, d...@hive.apache.org javascript:;, dev@pig.apache.org javascript:;, d...@oozie.apache.org javascript:; Cc: common-...@hadoop.apache.org javascript:; common-...@hadoop.apache.org javascript:; [Apologies for the wide distribution.] Dear HBase/Hive/Pig/Oozie communities, We, over at Hadoop are considering dropping support for JDK6 this year. As you maybe aware we just released hadoop-2.5.0 and are now considering making the next release i.e. hadoop-2.6.0 the *last* release of Apache Hadoop which supports JDK6. This means, from hadoop-2.7.0 onwards we will not support JDK6 anymore and we *may* start relying on JDK7-specific apis. Now, the above releases a proposal and we do not want to pull the trigger without talking to projects downstream - hence the request for you feedback. Please feel free to forward this to other communities you might deem to be at risk from this too. thanks, Arun
RE: Creating a branch for Pig on Spark (PIG-4059)
Add me, I work on Pig on spark. -Original Message- From: Cheolsoo Park [mailto:piaozhe...@gmail.com] Sent: Tuesday, August 26, 2014 10:57 AM To: Jarek Jarcec Cecho Cc: dev@pig.apache.org; Praveen R Subject: Re: Creating a branch for Pig on Spark (PIG-4059) Hi guys, I asked about branch committership to the infra mailing list, and here is the reply- Many projects have what they consider 'partial committers' that is folks who have access to specific parts of a projects svn tree. Some projects do this for GSoC participants, others as a mechanism for moving to 'full committership' within the project. Do note though that in the eyes of the ASF someone with an ICLA and an account with any permissions to commit code anywhere in the public svn tree is a committer. IOW, you would vote, have ICLAs filed, and request account creation as per normal, and then merely adjust the karma in asf-authorization-template (and or LDAP) Looks like we need to vote and follow the normal process just like any other new committer. @Praveen, Jacec, I think Mayur and Praveen from Sigmoid Analytics need branch committership. Will anyone else work on Pig-on-Spark? Please reply. Once I have a full list of people, I will open a vote for Pig PMCs. Thanks, Cheolsoo On Mon, Aug 25, 2014 at 11:51 AM, Cheolsoo Park piaozhe...@gmail.com wrote: Additionally, I will give branch-specific commit permission to people who will work on Pig on Spark (assuming it is possible). Please let me know if you have any objection on this too. On Mon, Aug 25, 2014 at 10:25 AM, Jarek Jarcec Cecho jar...@apache.org wrote: No objections from my side, thank you for creating the branch Cheolsoo and kudos to the Sigmoid Analytics team for the great work! Jarcec On Aug 25, 2014, at 7:14 PM, Cheolsoo Park piaozhe...@gmail.com wrote: Hi devs, Sigmoid Analytics has been working on Pig-on-Spark (PIG-4059), and they want to merge their work into Apache. I am going to create a Spark branch for them. Please let me know if you have any concerns. Thanks, Cheolsoo