[jira] [Created] (PIG-3078) Make a UDF that, given a string, returns just the columns prefixed by that string
Jonathan Coveney created PIG-3078: - Summary: Make a UDF that, given a string, returns just the columns prefixed by that string Key: PIG-3078 URL: https://issues.apache.org/jira/browse/PIG-3078 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Fix For: 0.12 This comes up fairly often, usually as the result of a join. Given that the resulting schema has the column name prepended, a udf in the following form could give just the columns from the desired relation: Pluck('relation_name', *) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (33 issues) Subscriber: pigdaily Key Summary PIG-3075Allow AvroStorage STORE Operations To Use Schema Specified By URI https://issues.apache.org/jira/browse/PIG-3075 PIG-3073POUserFunc creating log spam for large scripts https://issues.apache.org/jira/browse/PIG-3073 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3067HBaseStorage should be split up to become more managable https://issues.apache.org/jira/browse/PIG-3067 PIG-3066Fix TestPigRunner in trunk https://issues.apache.org/jira/browse/PIG-3066 PIG-3057make readField protected to be able to override it if we extend PigStorage https://issues.apache.org/jira/browse/PIG-3057 PIG-3051java.lang.IndexOutOfBoundsException failure with LimitOptimizer + ColumnPruning https://issues.apache.org/jira/browse/PIG-3051 PIG-3033test-patch failed with javadoc warnings https://issues.apache.org/jira/browse/PIG-3033 PIG-3029TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution https://issues.apache.org/jira/browse/PIG-3029 PIG-3028testGrunt dev test needs some command filters to run correctly without cygwin https://issues.apache.org/jira/browse/PIG-3028 PIG-3027pigTest unit test needs a newline filter for comparisons of golden multi-line https://issues.apache.org/jira/browse/PIG-3027 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification https://issues.apache.org/jira/browse/PIG-3025 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-3010Allow UDF's to flatten themselves https://issues.apache.org/jira/browse/PIG-3010 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2957TetsScriptUDF fail due to volume prefix in jar https://issues.apache.org/jira/browse/PIG-2957 PIG-2956Invalid cache specification for some streaming statement https://issues.apache.org/jira/browse/PIG-2956 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2834MultiStorage requires unused constructor argument https://issues.apache.org/jira/browse/PIG-2834 PIG-2824Pushing checking number of fields into LoadFunc https://issues.apache.org/jira/browse/PIG-2824 PIG-2661Pig uses an extra job for loading data in Pigmix L9 https://issues.apache.org/jira/browse/PIG-2661 PIG-2645PigSplit does not handle the case where SerializationFactory returns null https://issues.apache.org/jira/browse/PIG-2645 PIG-2614AvroStorage crashes on LOADING a single bad error https://issues.apache.org/jira/browse/PIG-2614 PIG-2507Semicolon in paramenters for UDF results in parsing error https://issues.apache.org/jira/browse/PIG-2507 PIG-2433Jython import module not working if module path is in classpath https://issues.apache.org/jira/browse/PIG-2433 PIG-2417Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. https://issues.apache.org/jira/browse/PIG-2417 PIG-2362Rework Ant build.xml to use macrodef instead of antcall https://issues.apache.org/jira/browse/PIG-2362 PIG-2312NPE when relation and column share the same name and used in Nested Foreach https://issues.apache.org/jira/browse/PIG-2312 PIG-1942script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects https://issues.apache.org/jira/browse/PIG-1942 PIG-1237Piggybank MutliStorage - specify field to write in output https://issues.apache.org/jira/browse/PIG-1237 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Created] (PIG-3077) TestMultiQueryLocal should not write in /tmp
Julien Le Dem created PIG-3077: -- Summary: TestMultiQueryLocal should not write in /tmp Key: PIG-3077 URL: https://issues.apache.org/jira/browse/PIG-3077 Project: Pig Issue Type: Test Reporter: Julien Le Dem temporary files from tests should be under build/test so that they are cleaned by "ant clean" Currently two test suites running on the same machine step on each other and create flaky tests results -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3072) Pig job reporting negative progress
[ https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3072: Release Note: (was: Committed to trunk. Thanks Koji.) Committed to trunk. Thanks Koji. > Pig job reporting negative progress > --- > > Key: PIG-3072 > URL: https://issues.apache.org/jira/browse/PIG-3072 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.12 > > Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt, > pig-3072-v04.txt > > > Our users pointed out that their jobs reporting negative progress. > 2012-11-02 21:43:11,538 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - -795% complete > ... > (due to TFileRecordReader) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3072) Pig job reporting negative progress
[ https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3072: Resolution: Fixed Release Note: Committed to trunk. Thanks Koji. Status: Resolved (was: Patch Available) > Pig job reporting negative progress > --- > > Key: PIG-3072 > URL: https://issues.apache.org/jira/browse/PIG-3072 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.12 > > Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt, > pig-3072-v04.txt > > > Our users pointed out that their jobs reporting negative progress. > 2012-11-02 21:43:11,538 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - -795% complete > ... > (due to TFileRecordReader) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3076) make TestScalarAliases more reliable
[ https://issues.apache.org/jira/browse/PIG-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-3076: --- Attachment: PIG-3076.patch PIG-3076.patch modifies the test so that input/output are written to the build folder (and are cleaned up by "ant clean") and data is deleted upfront so that it does not fail when a previous run failed before. > make TestScalarAliases more reliable > > > Key: PIG-3076 > URL: https://issues.apache.org/jira/browse/PIG-3076 > Project: Pig > Issue Type: Test >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Fix For: 0.11, 0.12 > > Attachments: PIG-3076.patch > > > currently, this test writes in the root directory so its output is not > deleted by ant clean. > Also it deletes its output in the end instead of the begining. > The consequence is that if the test fail once then it will keep failing until > the directory is manually cleaned up (not good for CI) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2684) :: in field name causes AvroStorage to fail
[ https://issues.apache.org/jira/browse/PIG-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510045#comment-13510045 ] Will Oberman commented on PIG-2684: --- I was just bit by this same bug. For me it was because I'm changing from running Hadoop directly against Cassnadra, to doing Cassandra -> Amazon EMR -> Cassandra (using Pig as my Hadoop language of choice, and S3 as the data interchange layer). And, my output schema that is cassandra compatible seems to have autogenerated ::'s. > :: in field name causes AvroStorage to fail > --- > > Key: PIG-2684 > URL: https://issues.apache.org/jira/browse/PIG-2684 > Project: Pig > Issue Type: Bug > Components: piggybank >Reporter: Fabian Alenius > > There appears to be a bug in AvroStorage which causes it to fail when there > are field names that contain :: > For example, the following will fail: > data = load 'test.txt' as (one, two); > grp = GROUP data by (one, two); > result = foreach grp generate FLATTEN(group); > > > store result into 'test.avro' using > org.apache.pig.piggybank.storage.avro.AvroStorage(); > ERROR 2999: Unexpected internal error. Illegal character in: group::one > While the following will succeed: > data = load 'test.txt' as (one, two); > grp = GROUP data by (one, two); > result = foreach grp generate FLATTEN(group) as (one,two); > > store result into 'test.avro' using > org.apache.pig.piggybank.storage.avro.AvroStorage(); > Here is a minimal test case: > data = load 'test.txt' as (one::two, three); > > > store data into 'test.avro' using > org.apache.pig.piggybank.storage.avro.AvroStorage(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510033#comment-13510033 ] Cheolsoo Park commented on PIG-3015: Yes, it does. Thank you, sir! > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: PIG-3015.patch > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni. > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Our release process
I am ok with tests running nightly and reverting patches that cause failures. We used to have that. Does anybody know what happened? Is anybody volunteering to make it work again? I would like to see specific criteria for what goes into the branch been published (rather than case-by-case). This way each team can decided if the criteria stringent enough of if they need to run a private branch. Olga From: Santhosh M S To: Julien Le Dem ; "dev@pig.apache.org" Cc: "billgra...@gmail.com" Sent: Friday, November 30, 2012 11:46 PM Subject: Re: Our release process HI Julien, You are making most of the points that I did on this thread (CI for e2e, not burdening clean e2e prior to every commit for a release branch). The only point on which there is no clear agreement is the definition of a bug that can be included in a previously released branch. I am fine with a case by case inclusion. Hi Olga, Are you fine with Julien's proposal as it stands - bugs that are included will be determined at the time of inclusion instead of doing it now. Santhosh From: Julien Le Dem To: dev@pig.apache.org; Santhosh M S Cc: "billgra...@gmail.com" Sent: Friday, November 30, 2012 5:37 PM Subject: Re: Our release process Proposed criteria: - it makes the tests fail. targets test-commit + test + e2e tests - a critical bug is reported in a short time frame (definition of critical not needed as it is rare and can be decided on a case by case basis) That raises another question: what are the existing CI servers running the tests? - the Apache CI runs test-commit and test (is it more stable now?) and not e2e. It would be great if it did. - we have a Jenkins build at Twitter where we run test-commit and test, we could not run e2e easily in our environment. - I understand there's a Yahoo/Hortonworks build (test-commit + test + e2e ???) Whenever those builds fail we should open or reopen JIRAS and fix it. The time it takes to run the full test suite makes it impractical to run on a desktop/laptop. For the release Pig-0.11.0 we need to get this list of JIRAs down to 0 and publish the jar. https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+PIG+AND+fixVersion+%3D+%220.11%22+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC%2C+due+ASC%2C+priority+DESC Julien On Thu, Nov 29, 2012 at 11:16 PM, Santhosh M S wrote: > Looks like everyone is interested in having frequent releases - I don't see > anyone disagreeing with that. > > Regarding "If a patch makes the release branch unstable, we revert it" - what are the criteria? If we can't decide on the criteria on this thread (already pretty long) then lets get the release trains going. We can revisit the criteria for inclusion of bug fixes when that happens. > > Santhosh > > > > From: Julien Le Dem > To: dev@pig.apache.org; Santhosh M S > Cc: "billgra...@gmail.com" > Sent: Thursday, November 29, 2012 9:45 AM > Subject: Re: Our release process > > The release branch receives only bug fixes. Patch level releases (3rd > version number) are issued out of the release branch and introduce > only bug fixes and no new features. > Deciding whether a patch is applied to the release branch is based on > preserving stability (as Bill said). If a patch makes the release > branch unstable, we revert it. > New features are added to trunk where new major and minor releases will > happen. > If we need a new feature out then we make a new minor release. > Doing frequent releases is the industry standard and will resolve > conflicts around what should go in a release branch. > > Making a new release is currently painful *because* we wait so long in > between two releases. Let's fix that. > > Julien > > On Wed, Nov 28, 2012 at 10:09 PM, Santhosh M S > wrote: >> Since releasing a major version once a month is agressive and we have not >> released on a quarterly basis, we should allow commits to a released branch >> to facilitate dot releases. >> >> If we are allowing commits to a released branch, the criteria for inclusion >> can be created anew or we use the industry standards for severity (or >> priority). It could be painful for a few folks but I don't see better >> alternatives. >> >> Regarding reverting commits based on e2e tests breaking: >> 1. Who is running the tests? >> 2. How often are they run? >> If we have nightly e2e runs then its easier to catch these errors early. If >> not the barrier for inclusion is pretty high and time consuming making it harder to develop. >> >> Santhosh >> >> >> >> From: Bill Graham >> To: dev@pig.apache.org >> Sent: Wednesday, November 28, 2012 11:39 AM >> Subject: Re: Our release process >> >> I agree releasing often is ideal, but releasing major versions once a month >> would be a bit agressive. >> >> +1 to Olga's initial definition of how Yahoo
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509992#comment-13509992 ] Joseph Adler commented on PIG-3015: --- I think that approach makes sense; each object in a file should be wrapped in a Tuple. Suppose that a file example.avro contained the data: {[1, 2, 3, 4, 5]} {[6, 7, 8, 9, 10]} and had this schema: {"name" : "IntArray", "type" : "array", "items" : "int"}, and we loaded this as A = LOAD 'example.avro' USING AvroStorage; The bag A would have the Pig schema A:{(IntArray:{(int)})}; it would contain two tuples, which would in turn each contain one bag of integers. Does that sound correct? If so, I'll go implement that. > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: PIG-3015.patch > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni. > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3072) Pig job reporting negative progress
[ https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-3072: -- Attachment: pig-3072-v04.txt bq. Can you use HadoopShims to create the TaskAttemptContext in your test. The test fails to compile with H23. Thanks Rohini. Uploading another patch with your suggestion. Ran both $ ant clean test -Dtestcase=TestTmpFileCompression $ ant -Dhadoopversion=23 clean test -Dtestcase=TestTmpFileCompression > Pig job reporting negative progress > --- > > Key: PIG-3072 > URL: https://issues.apache.org/jira/browse/PIG-3072 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.12 > > Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt, > pig-3072-v04.txt > > > Our users pointed out that their jobs reporting negative progress. > 2012-11-02 21:43:11,538 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - -795% complete > ... > (due to TFileRecordReader) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3076) make TestScalarAliases more reliable
Julien Le Dem created PIG-3076: -- Summary: make TestScalarAliases more reliable Key: PIG-3076 URL: https://issues.apache.org/jira/browse/PIG-3076 Project: Pig Issue Type: Test Reporter: Julien Le Dem Assignee: Julien Le Dem Fix For: 0.11, 0.12 currently, this test writes in the root directory so its output is not deleted by ant clean. Also it deletes its output in the end instead of the begining. The consequence is that if the test fail once then it will keep failing until the directory is manually cleaned up (not good for CI) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509964#comment-13509964 ] Cheolsoo Park commented on PIG-3015: Hi Joe, Thanks for your prompt response! To answer your questions, {quote} I have always assumed that AvroStorage was designed to be used with Hadoop sequence files that contained a series of records, so I implemented AvroStorage to only work with a file in this format. Are there cases where the highest level schema for a file will be another type? If so... what does that mean for pig? Is there one record per file? {quote} This is a good question, and I see your argument. But this will be very different from what the current AvroStorage does. Currently, a non-record type is automatically wrapped in a tuple. For example, "1" is loaded as (1) in Pig. If a file includes multiple values, they are loaded as multiple tuples as follows: {code:title=avro} cheolsoo@localhost:~/workspace/avro $java -jar avro-tools-1.5.4.jar getschema multiple_int.avro "int" cheolsoo@localhost:~/workspace/avro $java -jar avro-tools-1.5.4.jar tojson multiple_int.avro 1 2 3 {code} {code:title=pig} in = LOAD 'multiple_int.avro' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); DUMP in; (1) (2) (3) {code} Agreed that we can tell users that the top-level schema must be a record type, but I am afraid that people might not agree. In my experience, people tend to think that every valid Avro file should be able to be loaded by AvroStorage. Granted, there exist some restrictions (e.g. recursive records and unions), but even these restrictions have been loosened recently. Unless there is a convincing reason to not, I think that we should keep it that way. In many cases, people already have data pipeline in place (e.g. Flume produces Avro files => Pig consumes Avro files), and it is not guaranteed that the top-level schema is always a record type. {quote} Here's a specific example: suppose that we have this schema: \{"name" : "IntArray", "type" : "array", "items" : "int"\} Suppose that we have 3 files to load, each with this schema, each containing an array of 10 integers. Should we load this into pig as a single bag with 30 integers? A bag containing three bags (each, in turn, containing 10 integers)? Or reject this file entirely? {quote} Currently, they are loaded as 3 tuples, and each tuple contains a bag of 10 integers. {code} ({(1),(2), ... ,(10)}) ({(1),(2), ... ,(10)}) ({(1),(2), ... ,(10)}) {code} Thoughts? > Rewrite of AvroStorage > -- > > Key: PIG-3015 > URL: https://issues.apache.org/jira/browse/PIG-3015 > Project: Pig > Issue Type: Improvement > Components: piggybank >Reporter: Joseph Adler >Assignee: Joseph Adler > Attachments: PIG-3015.patch > > > The current AvroStorage implementation has a lot of issues: it requires old > versions of Avro, it copies data much more than needed, and it's verbose and > complicated. (One pet peeve of mine is that old versions of Avro don't > support Snappy compression.) > I rewrote AvroStorage from scratch to fix these issues. In early tests, the > new implementation is significantly faster, and the code is a lot simpler. > Rewriting AvroStorage also enabled me to implement support for Trevni. > I'm opening this ticket to facilitate discussion while I figure out the best > way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2812) Spill InternalCachedBag into only 1 file
[ https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-2812: --- Fix Version/s: (was: 0.11) I'm detaching this from pig-0.11 as it is not ready yet > Spill InternalCachedBag into only 1 file > > > Key: PIG-2812 > URL: https://issues.apache.org/jira/browse/PIG-2812 > Project: Pig > Issue Type: Bug > Components: data >Reporter: Haitao Yao >Assignee: Haitao Yao > Attachments: aa.jpg, spill.patch > > > I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I > found out that the InternalCachedBag creates a seperate tmp file, and the tmp > files is deleted on exit. So the file delete hook caused the OOM. > Why not just hold the tmp file handle and spill only one tmp file? > Too many tmp files may block the tasktracker start process, if the tmp files > are not cleaned on time and the tasktracker restarts at this specific time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3072) Pig job reporting negative progress
[ https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509920#comment-13509920 ] Rohini Palaniswamy commented on PIG-3072: - Koji, Can you use HadoopShims to create the TaskAttemptContext in your test. The test fails to compile with H23. {noformat} [javac] /apache/pig/trunk/test/org/apache/pig/test/TestTmpFileCompression.java:369: org.apache.hadoop.mapreduce.TaskAttemptContext is abstract; cannot be instantiated [javac] new TaskAttemptContext(conf, new TaskAttemptID())); {noformat} > Pig job reporting negative progress > --- > > Key: PIG-3072 > URL: https://issues.apache.org/jira/browse/PIG-3072 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.12 > > Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt > > > Our users pointed out that their jobs reporting negative progress. > 2012-11-02 21:43:11,538 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - -795% complete > ... > (due to TFileRecordReader) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira