[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (13 issues) Subscriber: pigdaily Key Summary PIG-3902PigServer creates cycle https://issues.apache.org/jira/browse/PIG-3902 PIG-3901Organize the Pig properties file and document all properties https://issues.apache.org/jira/browse/PIG-3901 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3874FileLocalizer temp path can sometimes be non-unique https://issues.apache.org/jira/browse/PIG-3874 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3867Added hadoop home to build classpath for build pig with unit test on windows https://issues.apache.org/jira/browse/PIG-3867 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3861duplicate jars get added to distributed cache https://issues.apache.org/jira/browse/PIG-3861 PIG-3825Stats collection needs to be changed for hadoop2 (with auto local mode) https://issues.apache.org/jira/browse/PIG-3825 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3441Allow Pig to use default resources from Configuration objects https://issues.apache.org/jira/browse/PIG-3441 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Resolved] (PIG-3880) After compiling trunk, I am seeing ClassLoaderObjectInputStream ClassNotFoundException.
[ https://issues.apache.org/jira/browse/PIG-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Medinets resolved PIG-3880. - Resolution: Fixed I feel the consensus is saying; Don't run old versions of hadoop. And I can live with that. > After compiling trunk, I am seeing ClassLoaderObjectInputStream > ClassNotFoundException. > --- > > Key: PIG-3880 > URL: https://issues.apache.org/jira/browse/PIG-3880 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.13.0 >Reporter: David Medinets > > I pulled trunk from subversion using the following commands: > mkdir pig > cd pig > svn co http://svn.apache.org/repos/asf/pig/trunk > cd trunk > ant > export PATH=$PATH:$HOME/pig/trunk/bin > export ACCUMULO_HOME=/opt/accumulo > export HADOOP_HOME=/opt/hadoop > export PIG_HOME=$HOME/pig/trunk > export PIG_CLASSPATH="$HOME/pig/trunk/build/ivy/lib/Pig/*" > export PIG_CLASSPATH="$ACCUMULO_HOME/lib/*:$PIG_CLASSPATH" > cd ~ > pig > Then I ran into this error: > java.lang.NoClassDefFoundError: > org/apache/commons/io/input/ClassLoaderObjectInputStream > at org.apache.pig.Main.run(Main.java:399) > When I change PIG_JAR to use the fat jar, I was able to run the pig command > without getting the exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3373) XMLLoader returns non-matching nodes when a tag name spans through the block boundary
[ https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988142#comment-13988142 ] Cheolsoo Park commented on PIG-3373: Isn't this addressed as part of PIG-3865 (rewrite of XMLLoader)? If so, can we close this jira? > XMLLoader returns non-matching nodes when a tag name spans through the block > boundary > - > > Key: PIG-3373 > URL: https://issues.apache.org/jira/browse/PIG-3373 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: site >Reporter: Ahmed Eldawy >Assignee: Ahmed Eldawy > Labels: patch > Attachments: PIG3373.patch, PIG3373_1.patch, PIG3373_2.patch, > PIG3373_3.patch, bad-file.xml.bz2, test-file-2.xml.bz2 > > > When node start tag spans two blocks this tag is returned even if it is not > of the type. > Example: For the following input file > > BLOCK BOUNDARY > entually id="dfasd"> > XMLoader with tag type 'event' should return only the first one but it > actually returns both of them -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3373) XMLLoader returns non-matching nodes when a tag name spans through the block boundary
[ https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3373: Status: Open (was: Patch Available) Sorry, but the patch no longer applies and I couldn't figure out how apply it manually. > XMLLoader returns non-matching nodes when a tag name spans through the block > boundary > - > > Key: PIG-3373 > URL: https://issues.apache.org/jira/browse/PIG-3373 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: site >Reporter: Ahmed Eldawy >Assignee: Ahmed Eldawy > Labels: patch > Attachments: PIG3373.patch, PIG3373_1.patch, PIG3373_2.patch, > PIG3373_3.patch, bad-file.xml.bz2, test-file-2.xml.bz2 > > > When node start tag spans two blocks this tag is returned even if it is not > of the type. > Example: For the following input file > > BLOCK BOUNDARY > entually id="dfasd"> > XMLoader with tag type 'event' should return only the first one but it > actually returns both of them -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3916) isEmpty should not be early terminating
[ https://issues.apache.org/jira/browse/PIG-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3916: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to branch-0.12 and trunk. Thanks Daniel for the review. > isEmpty should not be early terminating > --- > > Key: PIG-3916 > URL: https://issues.apache.org/jira/browse/PIG-3916 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy >Priority: Critical > Fix For: 0.13.0, 0.12.2 > > Attachments: PIG-3916-1.patch > > > PIG-2066 makes isEmpty early terminating which is very wrong. When there is a > binary condition with isEmpty() followed by something like SUM, it skips > records leading to wrong results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-3918) Implement PPNL for Tez mode (Pig side changes)
Cheolsoo Park created PIG-3918: -- Summary: Implement PPNL for Tez mode (Pig side changes) Key: PIG-3918 URL: https://issues.apache.org/jira/browse/PIG-3918 Project: Pig Issue Type: Sub-task Components: tez Affects Versions: tez-branch Reporter: Cheolsoo Park Fix For: tez-branch Currently, no event notifications are sent from Pig to PPNL in Tez mode. For example, {code} emitInitialPlanNotification emitLaunchStartedNotification emitJobsSubmittedNotification etc. {code} In addition, Pig should expose progress status information to PPNL. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3916) isEmpty should not be early terminating
[ https://issues.apache.org/jira/browse/PIG-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987909#comment-13987909 ] Daniel Dai commented on PIG-3916: - +1 for the patch. As I mentioned before, let's reverting isEmpty first. > isEmpty should not be early terminating > --- > > Key: PIG-3916 > URL: https://issues.apache.org/jira/browse/PIG-3916 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy >Priority: Critical > Fix For: 0.13.0, 0.12.2 > > Attachments: PIG-3916-1.patch > > > PIG-2066 makes isEmpty early terminating which is very wrong. When there is a > binary condition with isEmpty() followed by something like SUM, it skips > records leading to wrong results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3902) PigServer creates cycle
[ https://issues.apache.org/jira/browse/PIG-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Perkins updated PIG-3902: --- Fix Version/s: 0.11.1 Assignee: Cheolsoo Park Affects Version/s: 0.11.1 Status: Patch Available (was: Open) This goes deep. I updated the LogicalPlanBuilder such that, when building a load op, it checks the plan that's been generated "so far" for stores that match the location the load references. If so it links them. Some of the multiquery compilation code exploited the bug that plan.getSinks() always returned the stores. You could get away with that before the "postProcess" method on PigServer got called because loads were not yet linked to the stores they depended on. Hence, all the stores were leaves of the plan. However, with this patch that bug exploitation will no longer work. In order to get the loads you'll have to get the operators of the plan explicitly and create a list of stores. Presumably this happens elsewhere but I only ran the multiquery tests with my patch. > PigServer creates cycle > --- > > Key: PIG-3902 > URL: https://issues.apache.org/jira/browse/PIG-3902 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1 >Reporter: Jacob Perkins >Assignee: Cheolsoo Park > Fix For: 0.11.1 > > Attachments: broken-plan.png, cycle.diff, multiquery-cycle.diff > > > Under certain conditions PigServer creates a cycle in the logical plan. > Consider the following pseudocode: > {code} > A = load from 'A' using F1; > ...process... > B = store X into 'B' using F2; > C = load from 'B' using F3; > ...process... > D = store Y into 'A' using F1; > {code} > PigServer will, in ordinary cases, notice that an output path is equal to an > input path, and, if there's no path from the input to the output, make the > input a dependency of the output. However, PigServer orders the loads and > stores arbitrarily during that logic. Sometimes, in the code above, C is > correctly wired as a dependency of B and, since that creates a path from A to > D, A won't be made a dependency of D and we're good. On occasion though, the > ordering being arbitrary, A is wired as a dependency of D. That's no good. To > be fair, it's not actually a cycle, since when A is wired to D, there's a > path between C and B so the cycle won't actually get created. But it's still > a broken plan. > The offending PigServer code: > https://github.com/apache/pig/blob/branch-0.11/src/org/apache/pig/PigServer.java#L1678-L1693 > And here's some actual pig code that should reproduce the broken plan. Notice > I had to use a store function that wouldn't check the output. If you're just > using PigStorage this won't be reproducible since you can't write to the same > location you read from in that case. > {code} > A = load '$A' as (line:chararray); > A = foreach A generate flatten(TOKENIZE(LOWER(line))) as token; > store A into '$B'; > B = load '$B' as (token:chararray); > B = filter B by SIZE(token) > 3; > store B into '$A' using > org.apache.pig.piggybank.storage.DBStorage('com.mysql.jdbc.Driver', > 'dbc:mysql://localhost/test', 'INSERT INTO foobar (token) VALUES(?)'); > {code} > As far as a fix goes... I'd love some input. I've got some workarounds in > mind for the specific use case that brought this up, but the general problem > is more difficult. > As an aside, there's other issues with the PigServer code referenced above. > For example, it should almost certainly be using the full path (after > LoadFunc/StoreFunc.relativeToAbsolutePath) no? Try storing to a relative path > then loading from the absolute representation of that path in the same > script... Also, why isn't it checking the FuncSpec as well as the location? > Just trying to open up the discussion. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3902) PigServer creates cycle
[ https://issues.apache.org/jira/browse/PIG-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Perkins updated PIG-3902: --- Attachment: multiquery-cycle.diff > PigServer creates cycle > --- > > Key: PIG-3902 > URL: https://issues.apache.org/jira/browse/PIG-3902 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1 >Reporter: Jacob Perkins >Assignee: Cheolsoo Park > Fix For: 0.11.1 > > Attachments: broken-plan.png, cycle.diff, multiquery-cycle.diff > > > Under certain conditions PigServer creates a cycle in the logical plan. > Consider the following pseudocode: > {code} > A = load from 'A' using F1; > ...process... > B = store X into 'B' using F2; > C = load from 'B' using F3; > ...process... > D = store Y into 'A' using F1; > {code} > PigServer will, in ordinary cases, notice that an output path is equal to an > input path, and, if there's no path from the input to the output, make the > input a dependency of the output. However, PigServer orders the loads and > stores arbitrarily during that logic. Sometimes, in the code above, C is > correctly wired as a dependency of B and, since that creates a path from A to > D, A won't be made a dependency of D and we're good. On occasion though, the > ordering being arbitrary, A is wired as a dependency of D. That's no good. To > be fair, it's not actually a cycle, since when A is wired to D, there's a > path between C and B so the cycle won't actually get created. But it's still > a broken plan. > The offending PigServer code: > https://github.com/apache/pig/blob/branch-0.11/src/org/apache/pig/PigServer.java#L1678-L1693 > And here's some actual pig code that should reproduce the broken plan. Notice > I had to use a store function that wouldn't check the output. If you're just > using PigStorage this won't be reproducible since you can't write to the same > location you read from in that case. > {code} > A = load '$A' as (line:chararray); > A = foreach A generate flatten(TOKENIZE(LOWER(line))) as token; > store A into '$B'; > B = load '$B' as (token:chararray); > B = filter B by SIZE(token) > 3; > store B into '$A' using > org.apache.pig.piggybank.storage.DBStorage('com.mysql.jdbc.Driver', > 'dbc:mysql://localhost/test', 'INSERT INTO foobar (token) VALUES(?)'); > {code} > As far as a fix goes... I'd love some input. I've got some workarounds in > mind for the specific use case that brought this up, but the general problem > is more difficult. > As an aside, there's other issues with the PigServer code referenced above. > For example, it should almost certainly be using the full path (after > LoadFunc/StoreFunc.relativeToAbsolutePath) no? Try storing to a relative path > then loading from the absolute representation of that path in the same > script... Also, why isn't it checking the FuncSpec as well as the location? > Just trying to open up the discussion. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3916) isEmpty should not be early terminating
[ https://issues.apache.org/jira/browse/PIG-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987788#comment-13987788 ] Rohini Palaniswamy commented on PIG-3916: - Tried to see if only terminating isEmpty() was possible, but it is lot more work. For now reverting isEmpty() as it causes data loss. A more detailed description of impact for users to assess damage. Impact: The isEmpty() function in a FOREACH clause causes wrong results when used after a GROUP or COGROUP clause when accumulative mode is used (i.e there are no other non-Accumulator UDFs in the FOREACH clause). Also saw isEmpty() being evaluated to true when it was not. For eg: - If the FOREACH had only isEmpty() and SUM() functions in a binary condition which are both Accumulator UDFs, then wrong results are produced for the SUM() where it only sums up the first batch and terminates after that. SUM,COUNT, MIN, MAX are all accumulator UDFs. - If the FOREACH had any another non-accumulator UDF (For eg: STRSPLIT()), then it is not affected as accumulative mode would not be used. > isEmpty should not be early terminating > --- > > Key: PIG-3916 > URL: https://issues.apache.org/jira/browse/PIG-3916 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy >Priority: Critical > Fix For: 0.13.0, 0.12.2 > > Attachments: PIG-3916-1.patch > > > PIG-2066 makes isEmpty early terminating which is very wrong. When there is a > binary condition with isEmpty() followed by something like SUM, it skips > records leading to wrong results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (PIG-3917) Task process exit with nonzero status of -1. how to solve this?
Devendran s created PIG-3917: Summary: Task process exit with nonzero status of -1. how to solve this? Key: PIG-3917 URL: https://issues.apache.org/jira/browse/PIG-3917 Project: Pig Issue Type: Bug Components: build, grunt Affects Versions: 0.12.0 Environment: windows 8 Reporter: Devendran s Fix For: 0.12.0 I tried to run Pig scripts in Windows 8 system using cygwin. When i run pigscripts in local mode it works, but in mapreduce mode it shows the following error. 2014-05-02 14:47:35,345 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 6 2014-05-02 14:47:35,591 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201405021138_0010_m_01_2 task's state:UNASSIGNED 2014-05-02 14:47:35,591 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201405021138_0010_m_01_2 which needs 1 slots 2014-05-02 14:47:35,591 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 6 and trying to launch attempt_201405021138_0010_m_01_2 which needs 1 slots 2014-05-02 14:47:35,679 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201405021138_0010_m_949588628 2014-05-02 14:47:35,680 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201405021138_0010_m_949588628 spawned. 2014-05-02 14:47:35,685 INFO org.apache.hadoop.mapred.JvmManager: JVM Not killed jvm_201405021138_0010_m_949588628 but just removed 2014-05-02 14:47:35,685 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201405021138_0010_m_949588628 exited with exit code -1. Number of tasks it ran: 0 2014-05-02 14:47:35,685 WARN org.apache.hadoop.mapred.TaskRunner: attempt_201405021138_0010_m_01_2 : Child Error java.io.IOException: Task process exit with nonzero status of -1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258). Please help to find out the solution for the above. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3558) ORC support for Pig
[ https://issues.apache.org/jira/browse/PIG-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987469#comment-13987469 ] Daniel Dai commented on PIG-3558: - [~chitnis] Not sure if I understand correctly. Hive BINARY has clear meaning, but Pig bytearray is not. It means unknown datatype where user does not explicitly declare the datatype. The real data can be anything not just DataByteArray. So it should be safe to convert Hive BINARY to Pig bytearray but not vice versa. > ORC support for Pig > --- > > Key: PIG-3558 > URL: https://issues.apache.org/jira/browse/PIG-3558 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Daniel Dai >Assignee: Daniel Dai > Labels: porc > Fix For: 0.13.0 > > Attachments: PIG-3558-1.patch, PIG-3558-2.patch, PIG-3558-3.patch, > PIG-3558-4.patch, PIG-3558-5.patch > > > Adding LoadFunc and StoreFunc for ORC. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3916) isEmpty should not be early terminating
[ https://issues.apache.org/jira/browse/PIG-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987461#comment-13987461 ] Daniel Dai commented on PIG-3916: - Seems isEmpty is the main use case for PIG-2066. I am fine commit the patch as an immediate remedy but we also need to either rollback PIG-2066 completely (other early terminating UDF will share the same issue), or redo PIG-2066 in a right way (only terminate IsEmpty but not SUM). > isEmpty should not be early terminating > --- > > Key: PIG-3916 > URL: https://issues.apache.org/jira/browse/PIG-3916 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11 >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy >Priority: Critical > Fix For: 0.13.0, 0.12.2 > > Attachments: PIG-3916-1.patch > > > PIG-2066 makes isEmpty early terminating which is very wrong. When there is a > binary condition with isEmpty() followed by something like SUM, it skips > records leading to wrong results. -- This message was sent by Atlassian JIRA (v6.2#6252)