[jira] [Commented] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
[ https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613554#comment-13613554 ] Prashant Kommireddi commented on PIG-3261: -- Thanks [~qwertymaniac]. Do you think user set classpath should always be added at the beginning? Or would it make sense to have a property similar to HADOOP_USER_CLASSPATH_FIRST ? > User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not > appended > --- > > Key: PIG-3261 > URL: https://issues.apache.org/jira/browse/PIG-3261 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.10.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: PIG-3261.patch > > > Currently we are doing this wrong: > {code} > if [ "$PIG_CLASSPATH" != "" ]; then > CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} > {code} > This means that anything added to CLASSPATH until that point will never be > able to get overridden by a user set environment, which is wrong behavior. > Hadoop libs for example are added to CLASSPATH, before this extension is > called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions
[ https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613528#comment-13613528 ] Prashant Kommireddi commented on PIG-3259: -- {quote} The check you have here does not accept all valid double string representations {quote} - thanks for noticing that. {quote} One way to avoid performance degradation for 'correct' case would be to start by doing .valueOf() without checks, then use the number of non-numbers encountered to decide if want to be making the sanityCheckIntegerLongDecimal() calls {quote} - I am not clear on the advantage here. How do we determine the number of non-numbers without making calls to sanityCheck..()? > Optimize byte to Long/Integer conversions > - > > Key: PIG-3259 > URL: https://issues.apache.org/jira/browse/PIG-3259 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11, 0.11.1 >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: byteToLong.xlsx > > > These conversions can be performing better. If the input is not numeric > (1234abcd) the code calls Double.valueOf(String) regardless before finally > returning null. Any script that inadvertently (user's mistake or not) tries > to cast non-numeric column to int or long would result in many wasteful > calls. > We can avoid this and only handle the cases we find the input to be a decimal > number (1234.56) and return null otherwise even before trying > Double.valueOf(String). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2470) Issue with CSVEXcelStorage piggy bank function
[ https://issues.apache.org/jira/browse/PIG-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park resolved PIG-2470. Resolution: Fixed Fix Version/s: 0.12 Assignee: Jonathan Packer (was: Prashant Kommireddi) Closing the jira since it's fixed as part of PIG-3141. > Issue with CSVEXcelStorage piggy bank function > -- > > Key: PIG-2470 > URL: https://issues.apache.org/jira/browse/PIG-2470 > Project: Pig > Issue Type: Bug > Components: piggybank >Affects Versions: 0.9.0 >Reporter: Priya Karkele >Assignee: Jonathan Packer > Fix For: 0.12 > > Attachments: PIG-2470_2.patch, PIG-2470.patch > > > CSVExcelStorage piggy bank function skips the record, which has 1 or more > null column(s) in it. The record is not written to the file -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3141) Giving CSVExcelStorage an option to handle header rows
[ https://issues.apache.org/jira/browse/PIG-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3141: --- Resolution: Fixed Status: Resolved (was: Patch Available) +1. Committed to trunk. Thanks Jonathan P! Note that I got rid of all the ^M's in the following files while committing them: * contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java * contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestCSVExcelStorage.java > Giving CSVExcelStorage an option to handle header rows > -- > > Key: PIG-3141 > URL: https://issues.apache.org/jira/browse/PIG-3141 > Project: Pig > Issue Type: Improvement > Components: piggybank >Affects Versions: 0.11 >Reporter: Jonathan Packer >Assignee: Jonathan Packer > Fix For: 0.12 > > Attachments: csv.patch, csv_updated.patch, PIG-3141_update_3.diff, > PIG-3141_update_4.diff > > > Adds an argument to CSVExcelStorage to skip the header row when loading. This > works properly with multiple small files each with a header being combined > into one split, or a large file with a single header being split into > multiple splits. > Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug > involving quoted fields at the end of a line not escaping properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3141 [piggybank] Giving CSVExcelStorage an option to handle header rows
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9697/#review18379 --- Ship it! Ship It! - Cheolsoo Park On March 25, 2013, 3:17 p.m., Jonathan Packer wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/9697/ > --- > > (Updated March 25, 2013, 3:17 p.m.) > > > Review request for pig. > > > Description > --- > > Reviewboard for https://issues.apache.org/jira/browse/PIG-3141 > > Adds a "header treatment" option to CSVExcelStorage allowing header rows > (first row with column names) in files to be skipped when loading, or for a > header row with column names to be written when storing. Should be backwards > compatible--all unit-tests from the old CSVExcelStorage pass. > > > Diffs > - > > > contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java > 568b3f3 > > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestCSVExcelStorage.java > 9bed527 > > Diff: https://reviews.apache.org/r/9697/diff/ > > > Testing > --- > > cd contrib/piggybank/java > ant -Dtestcase=TestCSVExcelStorage test > > > Thanks, > > Jonathan Packer > >
[jira] [Updated] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
[ https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated PIG-3261: - Status: Patch Available (was: Open) > User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not > appended > --- > > Key: PIG-3261 > URL: https://issues.apache.org/jira/browse/PIG-3261 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.10.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: PIG-3261.patch > > > Currently we are doing this wrong: > {code} > if [ "$PIG_CLASSPATH" != "" ]; then > CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} > {code} > This means that anything added to CLASSPATH until that point will never be > able to get overridden by a user set environment, which is wrong behavior. > Hadoop libs for example are added to CLASSPATH, before this extension is > called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
[ https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated PIG-3261: - Attachment: PIG-3261.patch > User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not > appended > --- > > Key: PIG-3261 > URL: https://issues.apache.org/jira/browse/PIG-3261 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.10.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: PIG-3261.patch > > > Currently we are doing this wrong: > {code} > if [ "$PIG_CLASSPATH" != "" ]; then > CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} > {code} > This means that anything added to CLASSPATH until that point will never be > able to get overridden by a user set environment, which is wrong behavior. > Hadoop libs for example are added to CLASSPATH, before this extension is > called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
Harsh J created PIG-3261: Summary: User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended Key: PIG-3261 URL: https://issues.apache.org/jira/browse/PIG-3261 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.10.0 Reporter: Harsh J Assignee: Harsh J Attachments: PIG-3261.patch Currently we are doing this wrong: {code} if [ "$PIG_CLASSPATH" != "" ]; then CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} {code} This means that anything added to CLASSPATH until that point will never be able to get overridden by a user set environment, which is wrong behavior. Hadoop libs for example are added to CLASSPATH, before this extension is called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (34 issues) Subscriber: pigdaily Key Summary PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3247Piggybank functions to mimic OVER clause in SQL https://issues.apache.org/jira/browse/PIG-3247 PIG-3238Pig current releases lack a UDF Stuff(). This UDF deletes a specified length of characters and inserts another set of characters at a specified starting point. https://issues.apache.org/jira/browse/PIG-3238 PIG-3237Pig current releases lack a UDF MakeSet(). This UDF returns a set value (a string containing substrings separated by "," characters) consisting of the strings that have the corresponding bit in the first argument https://issues.apache.org/jira/browse/PIG-3237 PIG-3223AvroStorage does not handle comma separated input paths https://issues.apache.org/jira/browse/PIG-3223 PIG-3215[piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files https://issues.apache.org/jira/browse/PIG-3215 PIG-3210Pig fails to start when it cannot write log to log files https://issues.apache.org/jira/browse/PIG-3210 PIG-3198Let users use any function from PigType -> PigType as if it were builtlin https://issues.apache.org/jira/browse/PIG-3198 PIG-3193Fix "ant docs" warnings https://issues.apache.org/jira/browse/PIG-3193 PIG-3190Add LuceneTokenizer and SnowballTokenizer to Pig - useful text tokenization https://issues.apache.org/jira/browse/PIG-3190 PIG-3183rm or rmf commands should respect globbing/regex of path https://issues.apache.org/jira/browse/PIG-3183 PIG-3173Partition filter push down does not happen partition keys condition include a AND and OR construct https://issues.apache.org/jira/browse/PIG-3173 PIG-3166Update eclipse .classpath according to ivy library.properties https://issues.apache.org/jira/browse/PIG-3166 PIG-3164Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix. https://issues.apache.org/jira/browse/PIG-3164 PIG-3141Giving CSVExcelStorage an option to handle header rows https://issues.apache.org/jira/browse/PIG-3141 PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections https://issues.apache.org/jira/browse/PIG-3123 PIG-3122Operators should not implicitly become reserved keywords https://issues.apache.org/jira/browse/PIG-3122 PIG-3114Duplicated macro name error when using pigunit https://issues.apache.org/jira/browse/PIG-3114 PIG-3105Fix TestJobSubmission unit test failure. https://issues.apache.org/jira/browse/PIG-3105 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3028testGrunt dev test needs some command filters to run correctly without cygwin https://issues.apache.org/jira/browse/PIG-3028 PIG-3027pigTest unit test needs a newline filter for comparisons of golden multi-line https://issues.apache.org/jira/browse/PIG-3027 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-3010Allow UDF's to flatten themselves https://issues.apache.org/jira/browse/PIG-3010 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2643Use bytecode generation to make a performance replacement for InvokeForLong, InvokeForString, etc https://issues.apache.org/jira/browse/PIG-2643 PIG-2641Create toJSON function for all complex types: tuples, bags and maps https://issues.apache.org/jira/browse/PIG-2641 PIG-2591Unit tests should not write to /tmp but respect java.io.tmpdir https://issues.apache.org/jira/browse/PIG-2591 PIG-1914Support load/store JSON data in Pig https://issues.apache.org/jira/browse/PIG-1914 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions
[ https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613354#comment-13613354 ] Thejas M Nair commented on PIG-3259: Sounds like a good idea. The check you have here does not accept all valid double string representations (See http://docs.oracle.com/javase/6/docs/api/java/lang/Double.html#valueOf(java.lang.String) ) . (eg with exponent, or hexadecimal representation starting with 0x). But if we can avoid the performance degradation for the 'correct' [1] case (which seems to be be in range of 2-8% in the micro benchmark that ran for at least few seconds), that would be better. One way to avoid performance degradation for 'correct' case would be to start by doing .valueOf() without checks, then use the number of non-numbers encountered to decide if want to be making the sanityCheckIntegerLongDecimal() calls. [1] - by correct I mean the case where the field declared an integer or a double has correct representation. > Optimize byte to Long/Integer conversions > - > > Key: PIG-3259 > URL: https://issues.apache.org/jira/browse/PIG-3259 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11, 0.11.1 >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: byteToLong.xlsx > > > These conversions can be performing better. If the input is not numeric > (1234abcd) the code calls Double.valueOf(String) regardless before finally > returning null. Any script that inadvertently (user's mistake or not) tries > to cast non-numeric column to int or long would result in many wasteful > calls. > We can avoid this and only handle the cases we find the input to be a decimal > number (1234.56) and return null otherwise even before trying > Double.valueOf(String). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Release Pig 0.11.1 (candidate 0)
Yes, it is Ok with me. Daniel On Mon, Mar 25, 2013 at 2:44 PM, Julien Le Dem wrote: > +1 > The full test suite is passing. > I don't think we need not make a new rc just for one license header missing. > Daniel, is it OK for you ? > Thanks, > Julien > > On Mon, Mar 25, 2013 at 11:02 AM, Daniel Dai wrote: >> My fault for missing license header for >> UDFContextTestLoaderWithSignature. Added it to both files, Thanks >> Prashant! >> >> I run unit tests/e2e tests, both passed. +1 for the rc except for the >> license header issue. >> >> Daniel >> >> On Sun, Mar 24, 2013 at 11:18 PM, Prashant Kommireddi >> wrote: >>> Downloaded tarball and performed the following: >>> >>>1. ant releaseaudit - UDFContextTestLoaderWithSignature ( >>>http://svn.apache.org/viewvc?view=revision&revision=r1458036) and >>>DOTParser.jjt do not have Apache License header. >>>2. Verified RELEASE_NOTES.txt for correct version numbers >>>3. Verified build.xml points to next version (0.11.2) SNAPSHOT >>>4. Built and tested Piggybank, Built tutorial - looks good. >>>5. Tested jar by running scripts against 0.20.2 hadoop cluster (would be >>>great to have someone else test the same) >>>6. ant test-commit - all tests pass >>> >>> Except for #1, RC looks good to me. >>> Thanks, >>> -Prashant >>> >>> On Fri, Mar 22, 2013 at 7:58 AM, Bill Graham wrote: >>> Hi, I have created a candidate build for Pig 0.11.1. This is a maintenance release of Pig 0.11. Keys used to sign the release are available at: http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup Please download, test, and try it out: http://people.apache.org/~billgraham/pig-0.11.1-candidate-0/ Should we release this? Vote closes on next Thursday EOD, Mar 28th. Thanks, Bill
Re: [VOTE] Release Pig 0.11.1 (candidate 0)
+1 The full test suite is passing. I don't think we need not make a new rc just for one license header missing. Daniel, is it OK for you ? Thanks, Julien On Mon, Mar 25, 2013 at 11:02 AM, Daniel Dai wrote: > My fault for missing license header for > UDFContextTestLoaderWithSignature. Added it to both files, Thanks > Prashant! > > I run unit tests/e2e tests, both passed. +1 for the rc except for the > license header issue. > > Daniel > > On Sun, Mar 24, 2013 at 11:18 PM, Prashant Kommireddi > wrote: >> Downloaded tarball and performed the following: >> >>1. ant releaseaudit - UDFContextTestLoaderWithSignature ( >>http://svn.apache.org/viewvc?view=revision&revision=r1458036) and >>DOTParser.jjt do not have Apache License header. >>2. Verified RELEASE_NOTES.txt for correct version numbers >>3. Verified build.xml points to next version (0.11.2) SNAPSHOT >>4. Built and tested Piggybank, Built tutorial - looks good. >>5. Tested jar by running scripts against 0.20.2 hadoop cluster (would be >>great to have someone else test the same) >>6. ant test-commit - all tests pass >> >> Except for #1, RC looks good to me. >> Thanks, >> -Prashant >> >> On Fri, Mar 22, 2013 at 7:58 AM, Bill Graham wrote: >> >>> Hi, >>> >>> I have created a candidate build for Pig 0.11.1. This is a maintenance >>> release >>> of Pig 0.11. >>> >>> Keys used to sign the release are available at: >>> http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup >>> >>> Please download, test, and try it out: >>> http://people.apache.org/~billgraham/pig-0.11.1-candidate-0/ >>> >>> Should we release this? Vote closes on next Thursday EOD, Mar 28th. >>> >>> Thanks, >>> Bill >>>
[jira] [Resolved] (PIG-3260) Number of languages which support UDF is listed 3 instead of 5
[ https://issues.apache.org/jira/browse/PIG-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-3260. - Resolution: Fixed Fix Version/s: 0.11.1 Assignee: Daniel Dai Fixed in 0.11 branch. trunk is already fixed. Thanks for reporting! > Number of languages which support UDF is listed 3 instead of 5 > -- > > Key: PIG-3260 > URL: https://issues.apache.org/jira/browse/PIG-3260 > Project: Pig > Issue Type: Bug > Components: documentation >Affects Versions: 0.11 >Reporter: Sumod Pawgi >Assignee: Daniel Dai >Priority: Trivial > Fix For: 0.11.1 > > > On the Pig UDF page - http://pig.apache.org/docs/r0.11.0/udf.html#udfs, it > says that - "Pig UDFs can currently be implemented in three languages: Java, > Python, JavaScript, Ruby and Groovy." However, these are 5 languages. Very > minor probably typing mistake. But thought of reporting it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3259) Optimize byte to Long/Integer conversions
[ https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Kommireddi reassigned PIG-3259: Assignee: Prashant Kommireddi > Optimize byte to Long/Integer conversions > - > > Key: PIG-3259 > URL: https://issues.apache.org/jira/browse/PIG-3259 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11, 0.11.1 >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: byteToLong.xlsx > > > These conversions can be performing better. If the input is not numeric > (1234abcd) the code calls Double.valueOf(String) regardless before finally > returning null. Any script that inadvertently (user's mistake or not) tries > to cast non-numeric column to int or long would result in many wasteful > calls. > We can avoid this and only handle the cases we find the input to be a decimal > number (1234.56) and return null otherwise even before trying > Double.valueOf(String). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Release Pig 0.11.1 (candidate 0)
My fault for missing license header for UDFContextTestLoaderWithSignature. Added it to both files, Thanks Prashant! I run unit tests/e2e tests, both passed. +1 for the rc except for the license header issue. Daniel On Sun, Mar 24, 2013 at 11:18 PM, Prashant Kommireddi wrote: > Downloaded tarball and performed the following: > >1. ant releaseaudit - UDFContextTestLoaderWithSignature ( >http://svn.apache.org/viewvc?view=revision&revision=r1458036) and >DOTParser.jjt do not have Apache License header. >2. Verified RELEASE_NOTES.txt for correct version numbers >3. Verified build.xml points to next version (0.11.2) SNAPSHOT >4. Built and tested Piggybank, Built tutorial - looks good. >5. Tested jar by running scripts against 0.20.2 hadoop cluster (would be >great to have someone else test the same) >6. ant test-commit - all tests pass > > Except for #1, RC looks good to me. > Thanks, > -Prashant > > On Fri, Mar 22, 2013 at 7:58 AM, Bill Graham wrote: > >> Hi, >> >> I have created a candidate build for Pig 0.11.1. This is a maintenance >> release >> of Pig 0.11. >> >> Keys used to sign the release are available at: >> http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup >> >> Please download, test, and try it out: >> http://people.apache.org/~billgraham/pig-0.11.1-candidate-0/ >> >> Should we release this? Vote closes on next Thursday EOD, Mar 28th. >> >> Thanks, >> Bill >>
[jira] [Updated] (PIG-3141) Giving CSVExcelStorage an option to handle header rows
[ https://issues.apache.org/jira/browse/PIG-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Packer updated PIG-3141: - Description: Adds an argument to CSVExcelStorage to skip the header row when loading. This works properly with multiple small files each with a header being combined into one split, or a large file with a single header being split into multiple splits. Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug involving quoted fields at the end of a line not escaping properly. was: Adds an argument to CSVExcelStorage to skip the header row when loading. This works properly with multiple small files each with a header being combined into one split, or a large file with a single header being split into multiple splits. Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug involving quoted fields at the end of a line not escaping properly. Removes the choice of delimiter, since a CSV file ought to only use a comma delimiter, hence the name. > Giving CSVExcelStorage an option to handle header rows > -- > > Key: PIG-3141 > URL: https://issues.apache.org/jira/browse/PIG-3141 > Project: Pig > Issue Type: Improvement > Components: piggybank >Affects Versions: 0.11 >Reporter: Jonathan Packer >Assignee: Jonathan Packer > Fix For: 0.12 > > Attachments: csv.patch, csv_updated.patch, PIG-3141_update_3.diff, > PIG-3141_update_4.diff > > > Adds an argument to CSVExcelStorage to skip the header row when loading. This > works properly with multiple small files each with a header being combined > into one split, or a large file with a single header being split into > multiple splits. > Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug > involving quoted fields at the end of a line not escaping properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3141) Giving CSVExcelStorage an option to handle header rows
[ https://issues.apache.org/jira/browse/PIG-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Packer updated PIG-3141: - Attachment: PIG-3141_update_4.diff Updated diff with code review changes (also updated on ReviewBoard). Thanks for taking a look at this and the fixed-width patch Cheolsoo. > Giving CSVExcelStorage an option to handle header rows > -- > > Key: PIG-3141 > URL: https://issues.apache.org/jira/browse/PIG-3141 > Project: Pig > Issue Type: Improvement > Components: piggybank >Affects Versions: 0.11 >Reporter: Jonathan Packer >Assignee: Jonathan Packer > Fix For: 0.12 > > Attachments: csv.patch, csv_updated.patch, PIG-3141_update_3.diff, > PIG-3141_update_4.diff > > > Adds an argument to CSVExcelStorage to skip the header row when loading. This > works properly with multiple small files each with a header being combined > into one split, or a large file with a single header being split into > multiple splits. > Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug > involving quoted fields at the end of a line not escaping properly. > Removes the choice of delimiter, since a CSV file ought to only use a comma > delimiter, hence the name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3141 [piggybank] Giving CSVExcelStorage an option to handle header rows
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9697/ --- (Updated March 25, 2013, 3:17 p.m.) Review request for pig. Changes --- Code review changes for Cheolsoo Description --- Reviewboard for https://issues.apache.org/jira/browse/PIG-3141 Adds a "header treatment" option to CSVExcelStorage allowing header rows (first row with column names) in files to be skipped when loading, or for a header row with column names to be written when storing. Should be backwards compatible--all unit-tests from the old CSVExcelStorage pass. Diffs (updated) - contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java 568b3f3 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestCSVExcelStorage.java 9bed527 Diff: https://reviews.apache.org/r/9697/diff/ Testing --- cd contrib/piggybank/java ant -Dtestcase=TestCSVExcelStorage test Thanks, Jonathan Packer